We will clean the environment, setup the locations, define colors, and create a datestamp.
Clean the environment.
Set locations and working directories…
Create a new analysis directory...
[1] FALSE
[1] FALSE
[1] FALSE
[1] FALSE
[1] FALSE
[1] FALSE
[1] "/Users/slaan3/OneDrive - UMC Utrecht/PLINK/analyses/lookups/AE_TEMPLATE"
[1] "20210929.SOME_FANCY_PROJECTNAME.results.RData" "AE_TEMPLATE.Rmd"
[3] "AE_TEMPLATE.Rproj" "bulkRNAseq"
[5] "images" "LICENSE"
[7] "README_original.md" "README.md"
[9] "renv" "renv.lock"
[11] "scripts" "scRNAseq"
[13] "SNP" "SOME_FANCY_PROJECTNAME"
… a package-installation function …
install.packages.auto <- function(x) {
x <- as.character(substitute(x))
if(isTRUE(x %in% .packages(all.available = TRUE))) {
eval(parse(text = sprintf("require(\"%s\")", x)))
} else {
# Update installed packages - this may mean a full upgrade of R, which in turn
# may not be warrented.
#update.install.packages.auto(ask = FALSE)
eval(parse(text = sprintf("install.packages(\"%s\", dependencies = TRUE, repos = \"https://cloud.r-project.org/\")", x)))
}
if(isTRUE(x %in% .packages(all.available = TRUE))) {
eval(parse(text = sprintf("require(\"%s\")", x)))
} else {
if (!requireNamespace("BiocManager"))
install.packages("BiocManager")
BiocManager::install() # this would entail updating installed packages, which in turned may not be warrented
# Code for older versions of R (<3.5.0)
# source("http://bioconductor.org/biocLite.R")
# Update installed packages - this may mean a full upgrade of R, which in turn
# may not be warrented.
# biocLite(character(), ask = FALSE)
eval(parse(text = sprintf("BiocManager::install(\"%s\")", x)))
eval(parse(text = sprintf("require(\"%s\")", x)))
}
}… and load those packages.
install.packages.auto("readr")
install.packages.auto("optparse")
install.packages.auto("tools")
install.packages.auto("dplyr")
install.packages.auto("tidyr")
install.packages.auto("naniar")
# To get 'data.table' with 'fwrite' to be able to directly write gzipped-files
# Ref: https://stackoverflow.com/questions/42788401/is-possible-to-use-fwrite-from-data-table-with-gzfile
# install.packages("data.table", repos = "https://Rdatatable.gitlab.io/data.table")
library(data.table)
install.packages.auto("tidyverse")
install.packages.auto("knitr")
install.packages.auto("DT")
install.packages.auto("eeptools")
install.packages.auto("haven")
install.packages.auto("tableone")
install.packages.auto("BlandAltmanLeh")
# Install the devtools package from Hadley Wickham
install.packages.auto('devtools')
# for plotting
install.packages.auto("pheatmap")
install.packages.auto("forestplot")
install.packages.auto("ggplot2")
install.packages.auto("ggpubr")
install.packages.auto("UpSetR")
devtools::install_github("thomasp85/patchwork")We will create a datestamp and define the Utrecht Science Park Colour Scheme.
Today = format(as.Date(as.POSIXlt(Sys.time())), "%Y%m%d")
Today.Report = format(as.Date(as.POSIXlt(Sys.time())), "%A, %B %d, %Y")
### UtrechtScienceParkColoursScheme
###
### WebsitetoconvertHEXtoRGB:http://hex.colorrrs.com.
### Forsomefunctionsyoushoulddividethesenumbersby255.
###
### No. Color HEX (RGB) CHR MAF/INFO
###---------------------------------------------------------------------------------------
### 1 yellow #FBB820 (251,184,32) => 1 or 1.0>INFO
### 2 gold #F59D10 (245,157,16) => 2
### 3 salmon #E55738 (229,87,56) => 3 or 0.05<MAF<0.2 or 0.4<INFO<0.6
### 4 darkpink #DB003F ((219,0,63) => 4
### 5 lightpink #E35493 (227,84,147) => 5 or 0.8<INFO<1.0
### 6 pink #D5267B (213,38,123) => 6
### 7 hardpink #CC0071 (204,0,113) => 7
### 8 lightpurple #A8448A (168,68,138) => 8
### 9 purple #9A3480 (154,52,128) => 9
### 10 lavendel #8D5B9A (141,91,154) => 10
### 11 bluepurple #705296 (112,82,150) => 11
### 12 purpleblue #686AA9 (104,106,169) => 12
### 13 lightpurpleblue #6173AD (97,115,173/101,120,180) => 13
### 14 seablue #4C81BF (76,129,191) => 14
### 15 skyblue #2F8BC9 (47,139,201) => 15
### 16 azurblue #1290D9 (18,144,217) => 16 or 0.01<MAF<0.05 or 0.2<INFO<0.4
### 17 lightazurblue #1396D8 (19,150,216) => 17
### 18 greenblue #15A6C1 (21,166,193) => 18
### 19 seaweedgreen #5EB17F (94,177,127) => 19
### 20 yellowgreen #86B833 (134,184,51) => 20
### 21 lightmossgreen #C5D220 (197,210,32) => 21
### 22 mossgreen #9FC228 (159,194,40) => 22 or MAF>0.20 or 0.6<INFO<0.8
### 23 lightgreen #78B113 (120,177,19) => 23/X
### 24 green #49A01D (73,160,29) => 24/Y
### 25 grey #595A5C (89,90,92) => 25/XY or MAF<0.01 or 0.0<INFO<0.2
### 26 lightgrey #A2A3A4 (162,163,164) => 26/MT
###
### ADDITIONAL COLORS
### 27 midgrey #D7D8D7
### 28 verylightgrey #ECECEC"
### 29 white #FFFFFF
### 30 black #000000
###----------------------------------------------------------------------------------------------
uithof_color = c("#FBB820","#F59D10","#E55738","#DB003F","#E35493","#D5267B",
"#CC0071","#A8448A","#9A3480","#8D5B9A","#705296","#686AA9",
"#6173AD","#4C81BF","#2F8BC9","#1290D9","#1396D8","#15A6C1",
"#5EB17F","#86B833","#C5D220","#9FC228","#78B113","#49A01D",
"#595A5C","#A2A3A4", "#D7D8D7", "#ECECEC", "#FFFFFF", "#000000")
uithof_color_legend = c("#FBB820", "#F59D10", "#E55738", "#DB003F", "#E35493",
"#D5267B", "#CC0071", "#A8448A", "#9A3480", "#8D5B9A",
"#705296", "#686AA9", "#6173AD", "#4C81BF", "#2F8BC9",
"#1290D9", "#1396D8", "#15A6C1", "#5EB17F", "#86B833",
"#C5D220", "#9FC228", "#78B113", "#49A01D", "#595A5C",
"#A2A3A4", "#D7D8D7", "#ECECEC", "#FFFFFF", "#000000")
### ----------------------------------------------------------------------------Collaborators
[First name] [Initials] [Last name] [First name] [Initials] [Last name] [First name] [Initials] [Last name]
Athero-Express Team
Sander W. van der Laan, Arjan Boltjes, Michal Mokry, Hester den Ruijter, Folkert W. Asselbergs, Gert Jan de Borst, Gerard Pasterkamp.
Project ID [AE_[YYYYMMDD]_[PROJECTNUMBER]_[LEADCOLLABORATOR]_[PROJECTNAME]]
Collaboration to study common variants for PCSK9 in relation to atherosclerotic plaques characteristics. The main list of variants are given below.
Genes.xlsx - list of genes of interest.Variants.xlsx - list of variants of interest.library(openxlsx)
gene_list_df <- read.xlsx(paste0(PROJECT_loc, "/SNP/Genes.xlsx"), sheet = "Genes")
gene_list <- unlist(gene_list_df$Gene)
gene_list
variant_list <- read.xlsx(paste0(PROJECT_loc, "/SNP/Variants.xlsx"), sheet = "Variants")
DT::datatable(variant_list)We will test the hypothesis that common variants PCSK9 are associated with plaque characteristics. We will use data from the Athero-Express Biobank Study.
These are the questions we will address:
For the genetic analyses we will perform regression analyses adjusted for age, sex (where applicable) and principal components. So, we will apply the following model:
We will perform regression analyses adjusted for age, sex (where applicable) and principal components.
model 1: phenotype ~ age + sex + chip-used + PC1 + PC2 + year-of-surgery
We can also run sex-stratified analyses:
model 1m: phenotype ~ age + chip-used + PC1 + PC2 + year-of-surgery males only model 1f: phenotype ~ age + chip-used + PC1 + PC2 + year-of-surgery females only
phenotypes are:
calcification, coded Calc.bin no/minor vs. moderate/heavy stainingcollagen, coded Collagen.bin no/minor vs. moderate/heavy stainingfat10, coded Fat.bin_10 no/<10% fat vs. >10% fatfat40, coded Fat.bin_40 no/<40% fat vs. >40% fatintraplaque hemorrhage, coded IPH.bin no vs. yesmacrophages (CD68), coded macmean0 mean of computer-assisted calculation CD68+ region of interestsmooth muscle cells (alpha-actin), coded smcmean0 mean of computer-assisted calculation SMA+ region of interestintraplaque vessel density (CD34), coded vessel_density manually counted CD34+ cells per 3-4 hotspotsmast cells, coded Mast_cells_plaque manually counted mast cell tryptase+ cells (https://academic.oup.com/eurheartj/article/34/48/3699/484981) [Note: low sample size]neutrophils (CD66b), coded neutrophils manually counted CD66b+ cells (https://pubmed.ncbi.nlm.nih.gov/20595650/) [Note: low sample size]plaque vulnerability index, scaled from 0 to 4, where 0 is most stable, and 4 is least stable plaque phenotype.Continuous variables were inverse-rank normal transformated, indicated by _rankNorm.
We will explore the expression of target gene(s) in bulk RNAseq data from carotid plaques and associate these to plaque characteristics.
We will use the last dataset from the scRNAseq data, including 35 individuals, to the project target gene(s).
The Athero-Express Biobank Study (AE) contains plaque material of patients that underwent endarterectomyat two Dutch tertiary referral centers. Details of the study design were described before. Briefly, blood and plaque material were obtained during endarterectomy and stored at -80 ℃. Only carotid endarterectomy (CEA) patients were included in the present study. All patients provided informed consent and the study was approved by the medical ethics committee.
We genotyped the AE in three separate, but consecutive experiments. In short, DNA was extracted from EDTA blood or (when no blood was available) plaque samples of 1,858 consecutive patients from the Athero-Express Biobank Study and genotyped in 3 batches.
For the Athero-Express Genomics Study 1 (AEGS1) 891 patients (602 males, 262 females, 27 unknown sex), included between 2002 and 2007, were genotyped (440,763 markers) using the Affymetrix Genome-Wide Human SNP Array 5.0 (SNP5) chip (Affymetrix Inc., Santa Clara, CA, USA) at Eurofins Genomics (formerly known as AROS).
For the Athero-Express Genomics Study 2 (AEGS2) 954 patients (640 makes, 313 females, 1 unknown sex), included between 2002 and 2013, were genotyped (587,351 markers) using the Affymetrix AxiomⓇ GW CEU 1 Array (AxM) at the Genome Analysis Center.
For the Athero-Express Genomics Study 3 (AEGS3) 658 patients (448 males, 203 females, 5 unknown sex), included between 2002 and 2016, were genotyped (693,931 markers) using the Illumina GSA MD v1 BeadArray (GSA) at Human Genomics Facility, HUGE-F.
All experiments were carried out according to OECD standards.
We used the genotyping calling algorithms as advised by Affymetrix (AEGS1 and AEGS2) and Illumina (AEGS3):
After genotype calling, we adhered to community standard quality control and assurance (QCA) procedures of the genotype data from AEGS1, AEGS2, and AEGS3. Samples with low average genotype calling and sex discrepancies (compared to the clinical data available) were excluded. The data was further filtered on:
After QCA 2,493 samples remained, 108 of non-European descent/ancestry, and 156 related pairs. These comprise 890 samples and 407,712 SNPs in AEGS1, 869 samples and 534,508 SNPs in AEGS2, and 649954 samples and 534,508 SNPs in AEGS3 remained.
Before phasing using SHAPEIT2, data was lifted to genome build b37 using the liftOver tool from UCSC (https://genome.ucsc.edu/cgi-bin/hgLiftOver). Finally, data was imputed with 1000G phase 3, version 5 and HRC release 1.1 as a reference using the Michigan Imputation Server. These results were further integrated using QCTOOL v2, where HRC imputed variants are given precendence over 1000G phase 3 imputed variants.
We compared quality of the three AEGS datasets, and listed some variables of interest.
We checked the studytype (AE or not), and identity-by-descent (IBD) within and between datasets to aid in sample mixups, duplicate sample use, and relatedness. In addition, during genotyping quality control samples were identified that deviated from Hardy-Weinberg Equilibrium (HWE), had discordance in sex-coding and genotype sex, and deviated from the principal component analysis (PCA) plot.
We will load the Athero-Express Biobank Study data, and all the samples that were send for genotyping and the final QC’ed sampleList.
Loading Athero-Express Biobank Study clinical and biobank data, as well as the SampleList of genetic data.
cat("* get Athero-Express Biobank Study Database...")* get Athero-Express Biobank Study Database...
# METHOD 1: It seems this method gives loads of errors and warnings, which all are hard to comprehend
# or debug. We expect 3,527 samples, and 927 variables; we get 927 variables!!!
# AEdata = as.data.table(read.spss(paste0(INP_loc,"/2017-1NEW_AtheroExpressDatabase_ScientificAE_20171306_v1.0.sav"),
# trim.factor.names = TRUE, trim_values = TRUE, # we trim spaces in values
# reencode = TRUE, # we re-encode to the local locale encoding
# add.undeclared.levels = "append", # we do *not* want to convert to R-factors
# use.value.labels = FALSE, # we do *not* convert variables with value labels into R factors
# use.missings = TRUE, sub = "NA", # we will set every missing variable to NA
# duplicated.value.labels = "condense", # we will condense duplicated value labels
# to.data.frame = TRUE))
# AEdata.labels <- as.data.table(attr(AEdata, "variable.labels"))
# names(AEdata.labels) <- "Variable"
# METHOD 2: Using library("haven") importing seems flawless; best argument being:
# we expect 3,527 samples and 888 variables, which is what you'd get with this method
# So for now, METHOD 2 is prefered.
#
require(haven)
# AEDB <- haven::read_sav(paste0(AEDB_loc, "/2019-3NEW_AtheroExpressDatabase_ScientificAE_02072019_IC_added.sav"))
AEDB <- haven::read_sav(paste0(AEDB_loc, "/2020_1_NEW_AtheroExpressDatabase_ScientificAE_16-03-2020.sav"))
# writing off the SPSS data to an Excel.
# fwrite(AEdata, file = paste0(INP_loc,"/2017-1NEW_AtheroExpressDatabase_ScientificAE_20171306_v1.0.values.xlsx"),
# sep = ";", na = "NA", dec = ".", col.names = TRUE, row.names = FALSE,
# dateTimeAs = "ISO", showProgress = TRUE, verbose = TRUE)
# warnings()
AEDB[1:10, 1:10]dim(AEDB)[1] 3791 1091
cat("* get Athero-Express Genomics Study keys...")* get Athero-Express Genomics Study keys...
AEGS123.sampleList.keytable <- fread(paste0(AEGSQC_loc, "/QC/SELECTIONS/20200319.QC.AEGS123.sampleList.keytable.txt"))
dim(AEGS123.sampleList.keytable)[1] 2124 25
# AEGS123.sampleList.keytable[1:10, 1:10]We can examine the contents of the Athero-Express Biobank dataset to know what each variable is called, what class (type) it has, and what the variable description is.
There is an excellent post on this: https://www.r-bloggers.com/working-with-spss-labels-in-r/.
AEDB %>% sjPlot::view_df(show.type = TRUE,
show.frq = TRUE,
show.prc = TRUE,
show.na = TRUE,
max.len = TRUE,
wrap.labels = 20,
verbose = FALSE,
use.viewer = FALSE,
file = paste0(OUT_loc, "/", Today, ".AEDB.dictionary.html")) We need to be very strict in defining symptoms. Therefore we will fix a new variable that groups symptoms at inclusion.
Coding of symptoms is as follows:
We will group as follows:
# Fix symptoms
attach(AEDB)
AEDB[,"Symptoms.5G"] <- NA
AEDB$Symptoms.5G[sympt == 0] <- "Asymptomatic"
AEDB$Symptoms.5G[sympt == 1 | sympt == 7 | sympt == 13] <- "TIA"
AEDB$Symptoms.5G[sympt == 2 | sympt == 3] <- "Stroke"
AEDB$Symptoms.5G[sympt == 4 | sympt == 14 | sympt == 15 ] <- "Ocular"
AEDB$Symptoms.5G[sympt == 8 | sympt == 11] <- "Retinal infarction"
AEDB$Symptoms.5G[sympt == 5 | sympt == 9 | sympt == 10 | sympt == 12 | sympt == 16 | sympt == 17] <- "Other"
# AsymptSympt
AEDB[,"AsymptSympt"] <- NA
AEDB$AsymptSympt[sympt == -999] <- NA
AEDB$AsymptSympt[sympt == 0] <- "Asymptomatic"
AEDB$AsymptSympt[sympt == 1 | sympt == 7 | sympt == 13 | sympt == 2 | sympt == 3] <- "Symptomatic"
AEDB$AsymptSympt[sympt == 4 | sympt == 14 | sympt == 15 | sympt == 8 | sympt == 11 | sympt == 5 | sympt == 9 | sympt == 10 | sympt == 12 | sympt == 16 | sympt == 17] <- "Ocular and others"
# AsymptSympt
AEDB[,"AsymptSympt2G"] <- NA
AEDB$AsymptSympt2G[sympt == -999] <- NA
AEDB$AsymptSympt2G[sympt == 0] <- "Asymptomatic"
AEDB$AsymptSympt2G[sympt == 1 | sympt == 7 | sympt == 13 | sympt == 2 | sympt == 3 | sympt == 4 | sympt == 14 | sympt == 15 | sympt == 8 | sympt == 11 | sympt == 5 | sympt == 9 | sympt == 10 | sympt == 12 | sympt == 16 | sympt == 17] <- "Symptomatic"
detach(AEDB)
# table(AEDB$sympt, useNA = "ifany")
# table(AEDB$AsymptSympt2G, useNA = "ifany")
# table(AEDB$Symptoms.5G, useNA = "ifany")
#
# table(AEDB$AsymptSympt2G, AEDB$sympt, useNA = "ifany")
# table(AEDB$Symptoms.5G, AEDB$sympt, useNA = "ifany")
table(AEDB$AsymptSympt2G, AEDB$Symptoms.5G, useNA = "ifany")
Asymptomatic Ocular Other Retinal infarction Stroke TIA <NA>
Asymptomatic 333 0 0 0 0 0 0
Symptomatic 0 416 119 43 732 1045 0
<NA> 0 0 0 0 0 0 1103
# AEDB.temp <- subset(AEDB, select = c("STUDY_NUMBER", "UPID", "Age", "Gender", "Hospital", "Artery_summary", "sympt", "Symptoms.5G", "AsymptSympt"))
# require(labelled)
# AEDB.temp$Gender <- to_factor(AEDB.temp$Gender)
# AEDB.temp$Hospital <- to_factor(AEDB.temp$Hospital)
# AEDB.temp$Artery_summary <- to_factor(AEDB.temp$Artery_summary)
#
# DT::datatable(AEDB.temp[1:10,], caption = "Excerpt of the whole AEDB.", rownames = FALSE)
#
# table(AEDB.temp$Symptoms.5G, AEDB.temp$AsymptSympt)
#
# rm(AEDB.temp)We will also fix the plaquephenotypes variable.
Coding of symptoms is as follows:
# Fix plaquephenotypes
attach(AEDB)
AEDB[,"OverallPlaquePhenotype"] <- NA
AEDB$OverallPlaquePhenotype[plaquephenotype == -999] <- NA
AEDB$OverallPlaquePhenotype[plaquephenotype == -999] <- NA
AEDB$OverallPlaquePhenotype[plaquephenotype == 1] <- "fibrous"
AEDB$OverallPlaquePhenotype[plaquephenotype == 2] <- "fibroatheromatous"
AEDB$OverallPlaquePhenotype[plaquephenotype == 3] <- "atheromatous"
detach(AEDB)
# AEDB.temp <- subset(AEDB, select = c("STUDY_NUMBER", "UPID", "Age", "Gender", "Hospital", "Artery_summary", "plaquephenotype", "OverallPlaquePhenotype"))
# require(labelled)
# AEDB.temp$Gender <- to_factor(AEDB.temp$Gender)
# AEDB.temp$Hospital <- to_factor(AEDB.temp$Hospital)
# AEDB.temp$Artery_summary <- to_factor(AEDB.temp$Artery_summary)
#
# DT::datatable(AEDB.temp[1:10,], caption = "Excerpt of the whole AEDB.", rownames = FALSE)
#
# rm(AEDB.temp)We will also fix the diabetes status variable.
# Fix diabetes
attach(AEDB)
AEDB[,"DiabetesStatus"] <- NA
AEDB$DiabetesStatus[DM.composite == -999] <- NA
AEDB$DiabetesStatus[DM.composite == 0] <- "Control (no Diabetes Dx/Med)"
AEDB$DiabetesStatus[DM.composite == 1] <- "Diabetes"
detach(AEDB)
# AEDB.temp <- subset(AEDB, select = c("STUDY_NUMBER", "UPID", "Age", "Gender", "Hospital", "Artery_summary", "DM.composite", "DiabetesStatus"))
# require(labelled)
# AEDB.temp$Gender <- to_factor(AEDB.temp$Gender)
# AEDB.temp$Hospital <- to_factor(AEDB.temp$Hospital)
# AEDB.temp$Artery_summary <- to_factor(AEDB.temp$Artery_summary)
# AEDB.temp$DiabetesStatus <- to_factor(AEDB.temp$DiabetesStatus)
#
# DT::datatable(AEDB.temp[1:10,], caption = "Excerpt of the whole AEDB.", rownames = FALSE)
#
# rm(AEDB.temp)We will also fix the smoking status variable. We are interested in whether someone never, ever or is currently (at the time of inclusion) smoking. This is based on the questionnaire.
diet801: are you a smoker?diet802: did you smoke in the past?We already have some variables indicating smoking status:
SmokingReported: patient has reported to smoke.SmokingYearOR: smoking in the year of surgery?SmokerCurrent: currently smoking?require(labelled)Loading required package: labelled
AEDB$diet801 <- to_factor(AEDB$diet801)
AEDB$diet802 <- to_factor(AEDB$diet802)
AEDB$diet805 <- to_factor(AEDB$diet805)
AEDB$SmokingReported <- to_factor(AEDB$SmokingReported)
AEDB$SmokerCurrent <- to_factor(AEDB$SmokerCurrent)
AEDB$SmokingYearOR <- to_factor(AEDB$SmokingYearOR)
# table(AEDB$diet801)
# table(AEDB$diet802)
# table(AEDB$SmokingReported)
# table(AEDB$SmokerCurrent)
# table(AEDB$SmokingYearOR)
# table(AEDB$SmokingReported, AEDB$SmokerCurrent, useNA = "ifany", dnn = c("Reported smoking", "Current smoker"))
#
# table(AEDB$diet801, AEDB$diet802, useNA = "ifany", dnn = c("Smoker", "Past smoker"))
cat("\nFixing smoking status.\n")
Fixing smoking status.
attach(AEDB)
AEDB[,"SmokerStatus"] <- NA
AEDB$SmokerStatus[diet802 == "don't know"] <- "Never smoked"
AEDB$SmokerStatus[diet802 == "I still smoke"] <- "Current smoker"
AEDB$SmokerStatus[SmokerCurrent == "no" & diet802 == "no"] <- "Never smoked"
AEDB$SmokerStatus[SmokerCurrent == "no" & diet802 == "yes"] <- "Ex-smoker"
AEDB$SmokerStatus[SmokerCurrent == "yes"] <- "Current smoker"
AEDB$SmokerStatus[SmokerCurrent == "no data available/missing"] <- NA
# AEDB$SmokerStatus[is.na(SmokerCurrent)] <- "Never smoked"
detach(AEDB)
cat("\n* Current smoking status.\n")
* Current smoking status.
table(AEDB$SmokerCurrent,
useNA = "ifany",
dnn = c("Current smoker"))Current smoker
no data available/missing no yes <NA>
0 2364 1308 119
cat("\n* Updated smoking status.\n")
* Updated smoking status.
table(AEDB$SmokerStatus,
useNA = "ifany",
dnn = c("Updated smoking status"))Updated smoking status
Current smoker Ex-smoker Never smoked <NA>
1308 1814 389 280
cat("\n* Comparing to 'SmokerCurrent'.\n")
* Comparing to 'SmokerCurrent'.
table(AEDB$SmokerStatus, AEDB$SmokerCurrent,
useNA = "ifany",
dnn = c("Updated smoking status", "Current smoker")) Current smoker
Updated smoking status no data available/missing no yes <NA>
Current smoker 0 0 1308 0
Ex-smoker 0 1814 0 0
Never smoked 0 389 0 0
<NA> 0 161 0 119
# AEDB.temp <- subset(AEDB, select = c("STUDY_NUMBER", "UPID", "Age", "Gender", "Hospital", "Artery_summary", "DM.composite", "DiabetesStatus"))
# require(labelled)
# AEDB.temp$Gender <- to_factor(AEDB.temp$Gender)
# AEDB.temp$Hospital <- to_factor(AEDB.temp$Hospital)
# AEDB.temp$Artery_summary <- to_factor(AEDB.temp$Artery_summary)
# AEDB.temp$DiabetesStatus <- to_factor(AEDB.temp$DiabetesStatus)
#
# DT::datatable(AEDB.temp[1:10,], caption = "Excerpt of the whole AEDB.", rownames = FALSE)
#
# rm(AEDB.temp)We will also fix the alcohol status variable.
# Fix diabetes
attach(AEDB)
AEDB[,"AlcoholUse"] <- NA
AEDB$AlcoholUse[diet810 == -999] <- NA
AEDB$AlcoholUse[diet810 == 0] <- "No"
AEDB$AlcoholUse[diet810 == 1] <- "Yes"
detach(AEDB)
# AEDB.temp <- subset(AEDB, select = c("STUDY_NUMBER", "UPID", "Age", "Gender", "Hospital", "Artery_summary", "diet810", "AlcoholUse"))
# require(labelled)
# AEDB.temp$Gender <- to_factor(AEDB.temp$Gender)
# AEDB.temp$Hospital <- to_factor(AEDB.temp$Hospital)
# AEDB.temp$Artery_summary <- to_factor(AEDB.temp$Artery_summary)
# AEDB.temp$AlcoholUse <- to_factor(AEDB.temp$AlcoholUse)
#
# DT::datatable(AEDB.temp[1:10,], caption = "Excerpt of the whole AEDB.", rownames = FALSE)
#
# rm(AEDB.temp)We will also fix and inverse-rank normal transform the continuous (manually) scored plaque phenotypes.
AEDB$macmean0 <- as.numeric(AEDB$macmean0)
AEDB$smcmean0 <- as.numeric(AEDB$smcmean0)
AEDB$neutrophils <- as.numeric(AEDB$neutrophils)
AEDB$Mast_cells_plaque <- as.numeric(AEDB$Mast_cells_plaque)
AEDB$vessel_density_averaged <- as.numeric(AEDB$vessel_density_averaged)
AEDB$MAC_rankNorm <- qnorm((rank(AEDB$macmean0, na.last = "keep") - 0.5) / sum(!is.na(AEDB$macmean0)))
AEDB$SMC_rankNorm <- qnorm((rank(AEDB$smcmean0, na.last = "keep") - 0.5) / sum(!is.na(AEDB$smcmean0)))
AEDB$Neutrophils_rankNorm <- qnorm((rank(AEDB$neutrophils, na.last = "keep") - 0.5) / sum(!is.na(AEDB$neutrophils)))
AEDB$MastCells_rankNorm <- qnorm((rank(AEDB$Mast_cells_plaque, na.last = "keep") - 0.5) / sum(!is.na(AEDB$Mast_cells_plaque)))
AEDB$VesselDensity_rankNorm <- qnorm((rank(AEDB$vessel_density_averaged, na.last = "keep") - 0.5) / sum(!is.na(AEDB$vessel_density_averaged)))library(labelled)
AEDB$Gender <- to_factor(AEDB$Gender)
ggpubr::gghistogram(AEDB, "macmean0",
# y = "..count..",
color = "white",
fill = "Gender",
palette = c("#1290D9", "#DB003F"),
add = "median",
#add_density = TRUE,
rug = TRUE,
#add.params = list(color = "black", linetype = 2),
title = "% of macrophages (CD68)",
xlab = "% per region of interest",
ggtheme = theme_minimal())
ggpubr::gghistogram(AEDB, "MAC_rankNorm",
# y = "..count..",
color = "white",
fill = "Gender",
palette = c("#1290D9", "#DB003F"),
add = "median",
#add_density = TRUE,
rug = TRUE,
#add.params = list(color = "black", linetype = 2),
title = "% of macrophages (CD68)",
xlab = "% per region of interest\ninverse-rank normalized number",
ggtheme = theme_minimal())
ggpubr::gghistogram(AEDB, "smcmean0",
# y = "..count..",
color = "white",
fill = "Gender",
palette = c("#1290D9", "#DB003F"),
add = "median",
#add_density = TRUE,
rug = TRUE,
#add.params = list(color = "black", linetype = 2),
title = "% of smooth muscle cells (SMA)",
xlab = "% per region of interest",
ggtheme = theme_minimal())
ggpubr::gghistogram(AEDB, "SMC_rankNorm",
# y = "..count..",
color = "white",
fill = "Gender",
palette = c("#1290D9", "#DB003F"),
add = "median",
#add_density = TRUE,
rug = TRUE,
#add.params = list(color = "black", linetype = 2),
title = "% of smooth muscle cells (SMA)",
xlab = "% per region of interest\ninverse-rank normalized number",
ggtheme = theme_minimal())
ggpubr::gghistogram(AEDB, "neutrophils",
# y = "..count..",
color = "white",
fill = "Gender",
palette = c("#1290D9", "#DB003F"),
add = "median",
#add_density = TRUE,
rug = TRUE,
#add.params = list(color = "black", linetype = 2),
title = "number of neutrophils (CD66b)",
xlab = "counts per plaque",
ggtheme = theme_minimal())
ggpubr::gghistogram(AEDB, "Neutrophils_rankNorm",
# y = "..count..",
color = "white",
fill = "Gender",
palette = c("#1290D9", "#DB003F"),
add = "median",
#add_density = TRUE,
rug = TRUE,
#add.params = list(color = "black", linetype = 2),
title = "number of neutrophils (CD66b)",
xlab = "counts per plaque\ninverse-rank normalized number",
ggtheme = theme_minimal())
ggpubr::gghistogram(AEDB, "Mast_cells_plaque",
# y = "..count..",
color = "white",
fill = "Gender",
palette = c("#1290D9", "#DB003F"),
add = "median",
#add_density = TRUE,
rug = TRUE,
#add.params = list(color = "black", linetype = 2),
title = "number of mast cells",
xlab = "counts per plaque",
ggtheme = theme_minimal())
ggpubr::gghistogram(AEDB, "MastCells_rankNorm",
# y = "..count..",
color = "white",
fill = "Gender",
palette = c("#1290D9", "#DB003F"),
add = "median",
#add_density = TRUE,
rug = TRUE,
#add.params = list(color = "black", linetype = 2),
title = "number of mast cells",
xlab = "counts per plaque\ninverse-rank normalized number",
ggtheme = theme_minimal())
ggpubr::gghistogram(AEDB, "vessel_density_averaged",
# y = "..count..",
color = "white",
fill = "Gender",
palette = c("#1290D9", "#DB003F"),
add = "median",
#add_density = TRUE,
rug = TRUE,
#add.params = list(color = "black", linetype = 2),
title = "number of intraplaque neovessels",
xlab = "counts per 3-4 hotspots",
ggtheme = theme_minimal())
ggpubr::gghistogram(AEDB, "VesselDensity_rankNorm",
# y = "..count..",
color = "white",
fill = "Gender",
palette = c("#1290D9", "#DB003F"),
add = "median",
#add_density = TRUE,
rug = TRUE,
#add.params = list(color = "black", linetype = 2),
title = "number of intraplaque neovessels",
xlab = "counts per 3-4 hotspots\ninverse-rank normalized number",
ggtheme = theme_minimal())Here we calculate the plaque instability/vulnerability index
# Plaque vulnerability
require(labelled)
AEDB$Macrophages.bin <- to_factor(AEDB$Macrophages.bin)
AEDB$SMC.bin <- to_factor(AEDB$SMC.bin)
AEDB$IPH.bin <- to_factor(AEDB$IPH.bin)
AEDB$Calc.bin <- to_factor(AEDB$Calc.bin)
AEDB$Collagen.bin <- to_factor(AEDB$Collagen.bin)
AEDB$Fat.bin_10 <- to_factor(AEDB$Fat.bin_10)
AEDB$Fat.bin_40 <- to_factor(AEDB$Fat.bin_40)
table(AEDB$Macrophages.bin)
no/minor moderate/heavy
1602 1215
table(AEDB$Fat.bin_10)
<10% >10%
1226 1628
table(AEDB$Collagen.bin)
no/minor moderate/heavy
540 2297
table(AEDB$SMC.bin)
no/minor moderate/heavy
870 1962
table(AEDB$IPH.bin)
no yes
1223 1628
# SPSS code
#
# *** syntax- Plaque vulnerability**.
# COMPUTE Macro_instab = -999.
# IF macrophages.bin=2 Macro_instab=1.
# IF macrophages.bin=1 Macro_instab=0.
# EXECUTE.
#
# COMPUTE Fat10_instab = -999.
# IF Fat.bin_10=2 Fat10_instab=1.
# IF Fat.bin_10=1 Fat10_instab=0.
# EXECUTE.
#
# COMPUTE coll_instab=-999.
# IF Collagen.bin=2 coll_instab=0.
# IF Collagen.bin=1 coll_instab=1.
# EXECUTE.
#
#
# COMPUTE SMC_instab=-999.
# IF SMC.bin=2 SMC_instab=0.
# IF SMC.bin=1 SMC_instab=1.
# EXECUTE.
#
# COMPUTE IPH_instab=-999.
# IF IPH.bin=0 IPH_instab=0.
# IF IPH.bin=1 IPH_instab=1.
# EXECUTE.
#
# COMPUTE Instability=Macro_instab + Fat10_instab + coll_instab + SMC_instab + IPH_instab.
# EXECUTE.
# Fix plaquephenotypes
attach(AEDB)
# mac instability
AEDB[,"MAC_Instability"] <- NA
AEDB$MAC_Instability[Macrophages.bin == -999] <- NA
AEDB$MAC_Instability[Macrophages.bin == "no/minor"] <- 0
AEDB$MAC_Instability[Macrophages.bin == "moderate/heavy"] <- 1
# fat instability
AEDB[,"FAT10_Instability"] <- NA
AEDB$FAT10_Instability[Fat.bin_10 == -999] <- NA
AEDB$FAT10_Instability[Fat.bin_10 == " <10%"] <- 0
AEDB$FAT10_Instability[Fat.bin_10 == " >10%"] <- 1
# col instability
AEDB[,"COL_Instability"] <- NA
AEDB$COL_Instability[Collagen.bin == -999] <- NA
AEDB$COL_Instability[Collagen.bin == "no/minor"] <- 1
AEDB$COL_Instability[Collagen.bin == "moderate/heavy"] <- 0
# smc instability
AEDB[,"SMC_Instability"] <- NA
AEDB$SMC_Instability[SMC.bin == -999] <- NA
AEDB$SMC_Instability[SMC.bin == "no/minor"] <- 1
AEDB$SMC_Instability[SMC.bin == "moderate/heavy"] <- 0
# iph instability
AEDB[,"IPH_Instability"] <- NA
AEDB$IPH_Instability[IPH.bin == -999] <- NA
AEDB$IPH_Instability[IPH.bin == "no"] <- 0
AEDB$IPH_Instability[IPH.bin == "yes"] <- 1
detach(AEDB)
table(AEDB$MAC_Instability, useNA = "ifany")
0 1 <NA>
1602 1215 974
table(AEDB$FAT10_Instability, useNA = "ifany")
0 1 <NA>
1226 1628 937
table(AEDB$COL_Instability, useNA = "ifany")
0 1 <NA>
2297 540 954
table(AEDB$SMC_Instability, useNA = "ifany")
0 1 <NA>
1962 870 959
table(AEDB$IPH_Instability, useNA = "ifany")
0 1 <NA>
1223 1628 940
# creating vulnerability index
AEDB <- AEDB %>% mutate(Plaque_Vulnerability_Index = factor(rowSums(.[grep("_Instability", names(.))], na.rm = TRUE)),
)
table(AEDB$Plaque_Vulnerability_Index, useNA = "ifany")
0 1 2 3 4 5
1324 655 728 676 298 110
# str(AEDB$Plaque_Vulnerability_Index)We are interested in the following variables at baseline.
Age (years)
Female sex (N, %)
Hypertension (N, %)
SBP (mmHg)
DBP (mmHg)
Diabetes mellitus (N, %)
Total cholesterol levels (mg/dL)
LDL cholesterol levels (mg/dL)
HDL cholesterol levels (mg/dL)
Triglyceride levels (mg/dL)
Use of statins (N, %)
Use of antiplatelet drugs (N, %)
BMI (kg/m²)
Smoking status (N, %)
History of CAD (N, %)
History of PAD (N, %)
Clinical manifestations
eGFR (mL/min/1.73 m²)
stenosis
year of surgery
plaque characteristics
PCSK9
For this project we also fix the PCSK9 levels for analyses.
Measurement: This was measured in citrate plasma, at pg/mL using a LUMINEX assay.
# Fix hormones
attach(AEDB)
AEDB[,"Plasma_PCSK9"] <- NA
AEDB$Plasma_PCSK9 <- as.numeric(AEDB$PCSK9_plasma)
AEDB$Plasma_PCSK9[PCSK9_plasma == -999] <- NA
AEDB$Plasma_PCSK9[PCSK9_plasma == -888] <- NA
AEDB$Plasma_PCSK9[PCSK9_plasma == -777] <- NA
AEDB$Plasma_PCSK9[PCSK9_plasma == -666] <- NA
detach(AEDB)
AEDB$Plasma_PCSK9_rankNorm <- qnorm((rank(AEDB$PCSK9_plasma, na.last = "keep") - 0.5) / sum(!is.na(AEDB$PCSK9_plasma)))
AEDB.temp <- subset(AEDB, select = c("STUDY_NUMBER", "UPID", "Age", "Gender", "Hospital", "Artery_summary", "PCSK9_plasma", "Plasma_PCSK9", "Plasma_PCSK9_rankNorm"))
require(labelled)
AEDB.temp$Gender <- to_factor(AEDB.temp$Gender)
AEDB.temp$Hospital <- to_factor(AEDB.temp$Hospital)
AEDB.temp$Artery_summary <- to_factor(AEDB.temp$Artery_summary)
DT::datatable(AEDB.temp[1:10,], caption = "Excerpt of the whole AEDB.", rownames = FALSE)
rm(AEDB.temp)library(labelled)
AEDB$Gender <- to_factor(AEDB$Gender)
AEDB$PCSK9_plasma <- as.numeric(AEDB$PCSK9_plasma)
ggpubr::gghistogram(AEDB, "Plasma_PCSK9",
# y = "..count..",
color = "white",
fill = "Gender",
palette = c("#1290D9", "#DB003F"),
add = "median",
#add_density = TRUE,
rug = TRUE,
#add.params = list(color = "black", linetype = 2),
title = "PCSK9 (citrate-plasma)",
xlab = "pg/mL",
ggtheme = theme_minimal())
ggpubr::gghistogram(AEDB, "Plasma_PCSK9_rankNorm",
# y = "..count..",
color = "white",
fill = "Gender",
palette = c("#1290D9", "#DB003F"),
add = "median",
#add_density = TRUE,
rug = TRUE,
#add.params = list(color = "black", linetype = 2),
title = "PCSK9 (citrate-plasma)",
xlab = "pg/mL\ninverse-rank normalized",
ggtheme = theme_minimal())cat("====================================================================================================\n")====================================================================================================
cat("SELECTION THE SHIZZLE\n")SELECTION THE SHIZZLE
### Artery levels
# AEdata$Artery_summary:
# value label
# NOT USE - 0 No artery known (yet), no surgery (patient ill, died, exited study), re-numbered to AAA
# USE - 1 carotid (left & right)
# USE - 2 femoral/iliac (left, right or both sides)
# NOT USE - 3 other carotid arteries (common, external)
# NOT USE - 4 carotid bypass and injury (left, right or both sides)
# NOT USE - 5 aneurysmata (carotid & femoral)
# NOT USE - 6 aorta
# NOT USE - 7 other arteries (renal, popliteal, vertebral)
# NOT USE - 8 femoral bypass, angioseal and injury (left, right or both sides)
### AEdata$informedconsent
# value label
# NOT USE - -999 missing
# NOT USE - 0 no, died
# USE - 1 yes
# USE - 2 yes, health treatment when possible
# USE - 3 yes, no health treatment
# USE - 4 yes, no health treatment, no commercial business
# NOT USE - 5 yes, no tissue, no commerical business
# NOT USE - 6 yes, no tissue, no questionnaires, no medical info, no commercial business
# USE - 7 yes, no questionnaires, no health treatment, no commercial business
# USE - 8 yes, no questionnaires, health treatment when possible
# NOT USE - 9 yes, no tissue, no questionnaires, no health treatment, no commerical business
# USE - 10 yes, no health treatment, no medical info, no commercial business
# NOT USE - 11 yes, no tissue, no questionnaires, no health treatment, no medical info, no commercial business
# USE - 12 yes, no questionnaires, no health treatment
# NOT USE - 13 yes, no tissue, no health treatment
# NOT USE - 14 yes, no tissue, no questionnaires
# NOT USE - 15 yes, no tissue, health treatment when possible
# NOT USE - 16 yes, no tissue
# USE - 17 yes, no commerical business
# USE - 18 yes, health treatment when possible, no commercial business
# USE - 19 yes, no medical info, no commercial business
# USE - 20 yes, no questionnaires
# NOT USE - 21 yes, no tissue, no questionnaires, no health treatment, no medical info
# NOT USE - 22 yes, no tissue, no questionnaires, no health treatment, no commercial business
# USE - 23 yes, no medical info
# USE - 24 yes, no questionnaires, no commercial business
# USE - 25 yes, no questionnaires, no health treatment, no medical info
# USE - 26 yes, no questionnaires, health treatment when possible, no commercial business
# USE - 27 yes, no health treatment, no medical info
# NOT USE - 28 no, doesn't want to
# NOT USE - 29 no, unable to sign
# NOT USE - 30 no, no reaction
# NOT USE - 31 no, lost
# NOT USE - 32 no, too old
# NOT USE - 34 yes, no medical info, health treatment when possible
# NOT USE - 35 no (never asked for IC because there was no tissue)
# USE - 36 yes, no medical info, no commercial business, health treatment when possible
# NOT USE - 37 no, endpoint
# USE - 38 wil niets invullen, wel alles gebruiken
# USE - 39 second informed concents: yes, no commercial business
# NOT USE - 40 nooit geincludeerd
cat("- sanity checking PRIOR to selection")- sanity checking PRIOR to selection
library(data.table)
require(labelled)
ae.gender <- to_factor(AEDB$Gender)
ae.hospital <- to_factor(AEDB$Hospital)
table(ae.gender, ae.hospital, dnn = c("Sex", "Hospital")) Hospital
Sex St. Antonius, Nieuwegein UMC Utrecht
female 524 636
male 1211 1420
ae.artery <- to_factor(AEDB$Artery_summary)
table(ae.artery, ae.gender, dnn = c("Sex", "Artery")) Artery
Sex female male
No artery known (yet), no surgery (patient ill, died, exited study), re-numbered to AAA 0 0
carotid (left & right) 805 1781
femoral/iliac (left, right or both sides) 320 796
other carotid arteries (common, external) 17 35
carotid bypass and injury (left, right or both sides) 6 3
aneurysmata (carotid & femoral) 1 0
aorta 3 5
other arteries (renal, popliteal, vertebral) 4 9
femoral bypass, angioseal and injury (left, right or both sides) 4 2
rm(ae.gender, ae.hospital, ae.artery)
# I change numeric and factors manually because, well, I wouldn't know how to fix it otherwise
# to have this 'tibble' work with 'tableone'... :-)
AEDB$Age <- as.numeric(AEDB$Age)
AEDB$diastoli <- as.numeric(AEDB$diastoli)
AEDB$systolic <- as.numeric(AEDB$systolic)
AEDB$TC_finalCU <- as.numeric(AEDB$TC_finalCU)
AEDB$LDL_finalCU <- as.numeric(AEDB$LDL_finalCU)
AEDB$HDL_finalCU <- as.numeric(AEDB$HDL_finalCU)
AEDB$TG_finalCU <- as.numeric(AEDB$TG_finalCU)
AEDB$TC_final <- as.numeric(AEDB$TC_final)
AEDB$LDL_final <- as.numeric(AEDB$LDL_final)
AEDB$HDL_final <- as.numeric(AEDB$HDL_final)
AEDB$TG_final <- as.numeric(AEDB$TG_final)
AEDB$Age <- as.numeric(AEDB$Age)
AEDB$GFR_MDRD <- as.numeric(AEDB$GFR_MDRD)
AEDB$BMI <- as.numeric(AEDB$BMI)
AEDB$eCigarettes <- as.numeric(AEDB$eCigarettes)
AEDB$ePackYearsSmoking <- as.numeric(AEDB$ePackYearsSmoking)
AEDB$EP_composite_time <- as.numeric(AEDB$EP_composite_time)
AEDB$EP_major_time <- as.numeric(AEDB$EP_major_time)
AEDB$Plasma_PCSK9 <- as.numeric(AEDB$Plasma_PCSK9)
AEDB$Plasma_PCSK9_rankNorm <- as.numeric(AEDB$Plasma_PCSK9_rankNorm)
require(labelled)
AEDB$ORyear <- to_factor(AEDB$ORyear)
AEDB$Gender <- to_factor(AEDB$Gender)
AEDB$Hospital <- to_factor(AEDB$Hospital)
AEDB$KDOQI <- to_factor(AEDB$KDOQI)
AEDB$BMI_WHO <- to_factor(AEDB$BMI_WHO)
AEDB$DiabetesStatus <- to_factor(AEDB$DiabetesStatus)
AEDB$SmokerStatus <- to_factor(AEDB$SmokerStatus)
AEDB$AlcoholUse <- to_factor(AEDB$AlcoholUse)
AEDB$Hypertension.selfreport <- to_factor(AEDB$Hypertension1)
AEDB$Hypertension.selfreportdrug <- to_factor(AEDB$Hypertension2)
AEDB$Hypertension.composite <- to_factor(AEDB$Hypertension.composite)
AEDB$Hypertension.drugs <- to_factor(AEDB$Hypertension.drugs)
AEDB$Med.anticoagulants <- to_factor(AEDB$Med.anticoagulants)
AEDB$Med.all.antiplatelet <- to_factor(AEDB$Med.all.antiplatelet)
AEDB$Med.Statin.LLD <- to_factor(AEDB$Med.Statin.LLD)
AEDB$Stroke_Dx <- to_factor(AEDB$Stroke_Dx)
AEDB$CAD_history <- to_factor(AEDB$CAD_history)
AEDB$PAOD <- to_factor(AEDB$PAOD)
AEDB$Peripheral.interv <- to_factor(AEDB$Peripheral.interv)
AEDB$sympt <- to_factor(AEDB$sympt)
AEDB$Symptoms.3g <- to_factor(AEDB$Symptoms.3g)
AEDB$Symptoms.4g <- to_factor(AEDB$Symptoms.4g)
AEDB$Symptoms.5G <- to_factor(AEDB$Symptoms.5G)
AEDB$AsymptSympt <- to_factor(AEDB$AsymptSympt)
AEDB$AsymptSympt2G <- to_factor(AEDB$AsymptSympt2G)
AEDB$restenos <- to_factor(AEDB$restenos)
AEDB$stenose <- to_factor(AEDB$stenose)
AEDB$EP_composite <- to_factor(AEDB$EP_composite)
AEDB$EP_major <- to_factor(AEDB$EP_major)
AEDB$Macrophages.bin <- to_factor(AEDB$Macrophages.bin)
AEDB$SMC.bin <- to_factor(AEDB$SMC.bin)
AEDB$IPH.bin <- to_factor(AEDB$IPH.bin)
AEDB$Calc.bin <- to_factor(AEDB$Calc.bin)
AEDB$Collagen.bin <- to_factor(AEDB$Collagen.bin)
AEDB$Fat.bin_10 <- to_factor(AEDB$Fat.bin_10)
AEDB$Fat.bin_40 <- to_factor(AEDB$Fat.bin_40)
AEDB$OverallPlaquePhenotype <- to_factor(AEDB$OverallPlaquePhenotype)
AEDB$Plaque_Vulnerability_Index <- to_factor(AEDB$Plaque_Vulnerability_Index)
AEDB$Artery_summary <- to_factor(AEDB$Artery_summary)
AEDB$informedconsent <- to_factor(AEDB$informedconsent)
AEDB.CEA <- subset(AEDB,
(Artery_summary == "carotid (left & right)" | Artery_summary == "other carotid arteries (common, external)") & # we only want carotids
informedconsent != "missing" & # we are really strict in selecting based on 'informed consent'!
informedconsent != "no, died" &
informedconsent != "yes, no tissue, no commerical business" &
informedconsent != "yes, no tissue, no questionnaires, no medical info, no commercial business" &
informedconsent != "yes, no tissue, no questionnaires, no health treatment, no commerical business" &
informedconsent != "yes, no tissue, no questionnaires, no health treatment, no medical info, no commercial business" &
informedconsent != "yes, no tissue, no health treatment" &
informedconsent != "yes, no tissue, no questionnaires" &
informedconsent != "yes, no tissue, health treatment when possible" &
informedconsent != "yes, no tissue" &
informedconsent != "yes, no tissue, no questionnaires, no health treatment, no medical info" &
informedconsent != "yes, no tissue, no questionnaires, no health treatment, no commercial business" &
informedconsent != "no, doesn't want to" &
informedconsent != "no, unable to sign" &
informedconsent != "no, no reaction" &
informedconsent != "no, lost" &
informedconsent != "no, too old" &
informedconsent != "yes, no medical info, health treatment when possible" &
informedconsent != "no (never asked for IC because there was no tissue)" &
informedconsent != "no, endpoint" &
informedconsent != "nooit geincludeerd" &
!is.na(AsymptSympt2G))
# AEDB.CEA[1:10, 1:10]
dim(AEDB.CEA)[1] 2421 1113
AEDB.full <- subset(AEDB,
informedconsent != "missing" & # we are really strict in selecting based on 'informed consent'!
informedconsent != "no, died" &
informedconsent != "yes, no tissue, no commerical business" &
informedconsent != "yes, no tissue, no questionnaires, no medical info, no commercial business" &
informedconsent != "yes, no tissue, no questionnaires, no health treatment, no commerical business" &
informedconsent != "yes, no tissue, no questionnaires, no health treatment, no medical info, no commercial business" &
informedconsent != "yes, no tissue, no health treatment" &
informedconsent != "yes, no tissue, no questionnaires" &
informedconsent != "yes, no tissue, health treatment when possible" &
informedconsent != "yes, no tissue" &
informedconsent != "yes, no tissue, no questionnaires, no health treatment, no medical info" &
informedconsent != "yes, no tissue, no questionnaires, no health treatment, no commercial business" &
informedconsent != "no, doesn't want to" &
informedconsent != "no, unable to sign" &
informedconsent != "no, no reaction" &
informedconsent != "no, lost" &
informedconsent != "no, too old" &
informedconsent != "yes, no medical info, health treatment when possible" &
informedconsent != "no (never asked for IC because there was no tissue)" &
informedconsent != "no, endpoint" &
informedconsent != "nooit geincludeerd")
# AEDB.CEA[1:10, 1:10]
dim(AEDB.full)[1] 3458 1113
cat("===========================================================================================\n")===========================================================================================
cat("CREATE BASELINE TABLE\n")CREATE BASELINE TABLE
# Baseline table variables
basetable_vars = c("Hospital", "ORyear",
"Age", "Gender",
# "TC_finalCU", "LDL_finalCU", "HDL_finalCU", "TG_finalCU",
"TC_final", "LDL_final", "HDL_final", "TG_final",
# "hsCRP_plasma",
"systolic", "diastoli", "GFR_MDRD", "BMI",
"KDOQI", "BMI_WHO",
"SmokerStatus", "AlcoholUse",
"DiabetesStatus",
"Hypertension.selfreport", "Hypertension.selfreportdrug", "Hypertension.composite", "Hypertension.drugs",
"Med.anticoagulants", "Med.all.antiplatelet", "Med.Statin.LLD",
"Stroke_Dx", "sympt", "Symptoms.5G", "AsymptSympt", "AsymptSympt2G",
"restenos", "stenose",
"CAD_history", "PAOD", "Peripheral.interv",
"EP_composite", "EP_composite_time", "EP_major", "EP_major_time",
"MAC_rankNorm", "SMC_rankNorm", "Macrophages.bin", "SMC.bin",
"Neutrophils_rankNorm", "MastCells_rankNorm",
"IPH.bin", "VesselDensity_rankNorm",
"Calc.bin", "Collagen.bin",
"Fat.bin_10", "Fat.bin_40",
"OverallPlaquePhenotype", "Plaque_Vulnerability_Index",
"Plasma_PCSK9", "Plasma_PCSK9_rankNorm")
basetable_bin = c("Gender",
"KDOQI", "BMI_WHO",
"SmokerStatus", "AlcoholUse",
"DiabetesStatus",
"Hypertension.selfreport", "Hypertension.selfreportdrug", "Hypertension.composite", "Hypertension.drugs",
"Med.anticoagulants", "Med.all.antiplatelet", "Med.Statin.LLD",
"Stroke_Dx", "sympt", "Symptoms.5G", "AsymptSympt", "AsymptSympt2G",
"restenos", "stenose",
"CAD_history", "PAOD", "Peripheral.interv",
"EP_composite", "Macrophages.bin", "SMC.bin",
"IPH.bin",
"Calc.bin", "Collagen.bin",
"Fat.bin_10", "Fat.bin_40",
"OverallPlaquePhenotype", "Plaque_Vulnerability_Index")
# basetable_bin
basetable_con = basetable_vars[!basetable_vars %in% basetable_bin]
# basetable_conShowing the baseline table of the whole Athero-Express Biobank.
# Create baseline tables
# http://rstudio-pubs-static.s3.amazonaws.com/13321_da314633db924dc78986a850813a50d5.html
AEDB.tableOne = print(CreateTableOne(vars = basetable_vars,
# factorVars = basetable_bin,
# strata = "Symptoms.4g",
data = AEDB.full, includeNA = TRUE),
nonnormal = c(), missing = TRUE,
quote = FALSE, noSpaces = FALSE, showAllLevels = TRUE, explain = TRUE,
format = "pf",
contDigits = 3)[,1:3]
level Overall Missing
n 3458
Hospital % (freq) St. Antonius, Nieuwegein 45.3 (1567) 0.0
UMC Utrecht 54.7 (1891)
ORyear % (freq) No data available/missing 0.0 ( 0) 0.0
2002 2.5 ( 86)
2003 5.5 ( 191)
2004 7.7 ( 265)
2005 8.0 ( 276)
2006 7.3 ( 254)
2007 6.0 ( 206)
2008 5.5 ( 190)
2009 7.1 ( 246)
2010 8.0 ( 278)
2011 7.0 ( 243)
2012 8.2 ( 284)
2013 6.8 ( 235)
2014 8.1 ( 281)
2015 2.2 ( 77)
2016 3.5 ( 121)
2017 2.3 ( 79)
2018 2.2 ( 77)
2019 2.0 ( 69)
Age (mean (SD)) 68.733 (9.214) 0.0
Gender % (freq) female 29.7 (1026) 0.0
male 70.3 (2432)
TC_final (mean (SD)) 4.769 (1.406) 45.9
LDL_final (mean (SD)) 2.762 (1.061) 53.6
HDL_final (mean (SD)) 1.207 (0.436) 50.2
TG_final (mean (SD)) 1.738 (1.099) 51.0
systolic (mean (SD)) 150.820 (24.992) 13.0
diastoli (mean (SD)) 80.072 (22.444) 13.0
GFR_MDRD (mean (SD)) 74.865 (24.412) 6.6
BMI (mean (SD)) 26.395 (3.997) 5.8
KDOQI % (freq) No data available/missing 0.0 ( 0) 6.6
Normal kidney function 21.7 ( 750)
CKD 2 (Mild) 47.9 (1656)
CKD 3 (Moderate) 21.7 ( 750)
CKD 4 (Severe) 1.5 ( 51)
CKD 5 (Failure) 0.6 ( 22)
<NA> 6.6 ( 229)
BMI_WHO % (freq) No data available/missing 0.0 ( 0) 5.8
Underweight 1.1 ( 37)
Normal 35.3 (1222)
Overweight 43.6 (1508)
Obese 14.1 ( 489)
<NA> 5.8 ( 202)
SmokerStatus % (freq) Current smoker 34.3 (1187) 5.7
Ex-smoker 49.6 (1716)
Never smoked 10.4 ( 358)
<NA> 5.7 ( 197)
AlcoholUse % (freq) No 32.2 (1114) 4.1
Yes 63.7 (2203)
<NA> 4.1 ( 141)
DiabetesStatus % (freq) Control (no Diabetes Dx/Med) 73.3 (2534) 1.2
Diabetes 25.5 ( 883)
<NA> 1.2 ( 41)
Hypertension.selfreport % (freq) No data available/missing 0.0 ( 0) 3.6
no 23.7 ( 820)
yes 72.7 (2513)
<NA> 3.6 ( 125)
Hypertension.selfreportdrug % (freq) No data available/missing 0.0 ( 0) 5.0
no 29.0 (1002)
yes 66.0 (2284)
<NA> 5.0 ( 172)
Hypertension.composite % (freq) No data available/missing 0.0 ( 0) 1.3
no 13.5 ( 466)
yes 85.2 (2947)
<NA> 1.3 ( 45)
Hypertension.drugs % (freq) No data available/missing 0.0 ( 0) 1.5
no 21.0 ( 725)
yes 77.5 (2681)
<NA> 1.5 ( 52)
Med.anticoagulants % (freq) No data available/missing 0.0 ( 0) 1.6
no 85.8 (2967)
yes 12.6 ( 434)
<NA> 1.6 ( 57)
Med.all.antiplatelet % (freq) No data available/missing 0.0 ( 0) 1.6
no 13.3 ( 459)
yes 85.1 (2943)
<NA> 1.6 ( 56)
Med.Statin.LLD % (freq) No data available/missing 0.0 ( 0) 1.5
no 21.0 ( 727)
yes 77.4 (2678)
<NA> 1.5 ( 53)
Stroke_Dx % (freq) Missing 0.0 ( 0) 7.5
No stroke diagnosed 75.6 (2613)
Stroke diagnosed 16.9 ( 586)
<NA> 7.5 ( 259)
sympt % (freq) missing 0.0 ( 0) 28.3
Asymptomatic 9.1 ( 314)
TIA 28.0 ( 968)
minor stroke 11.8 ( 408)
Major stroke 6.9 ( 238)
Amaurosis fugax 11.0 ( 380)
Four vessel disease 1.1 ( 39)
Vertebrobasilary TIA 0.1 ( 5)
Retinal infarction 1.0 ( 34)
Symptomatic, but aspecific symtoms 1.6 ( 57)
Contralateral symptomatic occlusion 0.3 ( 11)
retinal infarction 0.2 ( 6)
armclaudication due to occlusion subclavian artery, CEA needed for bypass 0.0 ( 1)
retinal infarction + TIAs 0.0 ( 0)
Ocular ischemic syndrome 0.5 ( 16)
ischemisch glaucoom 0.0 ( 0)
subclavian steal syndrome 0.1 ( 2)
TGA 0.0 ( 0)
<NA> 28.3 ( 979)
Symptoms.5G % (freq) Asymptomatic 9.1 ( 314) 28.3
Ocular 11.5 ( 396)
Other 3.2 ( 110)
Retinal infarction 1.2 ( 40)
Stroke 18.7 ( 646)
TIA 28.1 ( 973)
<NA> 28.3 ( 979)
AsymptSympt % (freq) Asymptomatic 9.1 ( 314) 28.3
Ocular and others 15.8 ( 546)
Symptomatic 46.8 (1619)
<NA> 28.3 ( 979)
AsymptSympt2G % (freq) Asymptomatic 9.1 ( 314) 28.3
Symptomatic 62.6 (2165)
<NA> 28.3 ( 979)
restenos % (freq) missing 0.0 ( 0) 3.9
de novo 87.0 (3007)
restenosis 9.0 ( 312)
stenose bij angioseal na PTCA 0.1 ( 5)
<NA> 3.9 ( 134)
stenose % (freq) missing 0.0 ( 0) 6.7
0-49% 0.7 ( 23)
50-70% 6.9 ( 240)
70-90% 36.0 (1246)
90-99% 30.0 (1038)
100% (Occlusion) 14.4 ( 497)
NA 0.1 ( 3)
50-99% 2.5 ( 86)
70-99% 2.7 ( 93)
99 0.1 ( 2)
<NA> 6.7 ( 230)
CAD_history % (freq) Missing 0.0 ( 0) 2.0
No history CAD 64.1 (2216)
History CAD 33.9 (1173)
<NA> 2.0 ( 69)
PAOD % (freq) missing/no data 0.0 ( 0) 1.5
no 55.7 (1927)
yes 42.8 (1479)
<NA> 1.5 ( 52)
Peripheral.interv % (freq) no 68.0 (2352) 2.9
yes 29.1 (1006)
<NA> 2.9 ( 100)
EP_composite % (freq) No data available. 0.0 ( 0) 5.1
No composite endpoints 63.1 (2181)
Composite endpoints 31.9 (1102)
<NA> 5.1 ( 175)
EP_composite_time (mean (SD)) 2.335 (1.166) 5.2
EP_major % (freq) No data available. 0.0 ( 0) 5.1
No major events (endpoints) 83.3 (2882)
Major events (endpoints) 11.6 ( 401)
<NA> 5.1 ( 175)
EP_major_time (mean (SD)) 2.735 (0.988) 5.2
MAC_rankNorm (mean (SD)) 0.004 (0.993) 32.9
SMC_rankNorm (mean (SD)) -0.003 (1.004) 32.9
Macrophages.bin % (freq) no/minor 41.6 (1437) 26.3
moderate/heavy 32.2 (1112)
<NA> 26.3 ( 909)
SMC.bin % (freq) no/minor 23.1 ( 799) 25.9
moderate/heavy 51.0 (1763)
<NA> 25.9 ( 896)
Neutrophils_rankNorm (mean (SD)) 0.010 (0.953) 91.0
MastCells_rankNorm (mean (SD)) -0.009 (1.000) 93.0
IPH.bin % (freq) no 31.8 (1099) 25.3
yes 42.9 (1483)
<NA> 25.3 ( 876)
VesselDensity_rankNorm (mean (SD)) 0.017 (0.978) 48.3
Calc.bin % (freq) no/minor 38.0 (1314) 25.3
moderate/heavy 36.7 (1269)
<NA> 25.3 ( 875)
Collagen.bin % (freq) no/minor 14.3 ( 496) 25.8
moderate/heavy 59.9 (2071)
<NA> 25.8 ( 891)
Fat.bin_10 % (freq) <10% 31.7 (1096) 25.3
>10% 43.0 (1488)
<NA> 25.3 ( 874)
Fat.bin_40 % (freq) <40% 59.5 (2056) 25.3
>40% 15.3 ( 528)
<NA> 25.3 ( 874)
OverallPlaquePhenotype % (freq) atheromatous 14.7 ( 507) 25.9
fibroatheromatous 22.2 ( 769)
fibrous 37.2 (1285)
<NA> 25.9 ( 897)
Plaque_Vulnerability_Index % (freq) 0 35.0 (1212) 0.0
1 17.1 ( 590)
2 19.0 ( 657)
3 18.0 ( 622)
4 8.0 ( 277)
5 2.9 ( 100)
Plasma_PCSK9 (mean (SD)) 31932.936 (18395.193) 59.2
Plasma_PCSK9_rankNorm (mean (SD)) -0.002 (1.004) 59.2
# Create baseline tables
# http://rstudio-pubs-static.s3.amazonaws.com/13321_da314633db924dc78986a850813a50d5.html
AEDB.CEA.tableOne = print(CreateTableOne(vars = basetable_vars,
# factorVars = basetable_bin,
# strata = "Symptoms.4g",
data = AEDB.CEA, includeNA = TRUE),
nonnormal = c(), missing = TRUE,
quote = FALSE, noSpaces = FALSE, showAllLevels = TRUE, explain = TRUE,
format = "pf",
contDigits = 3)[,1:3]
level Overall Missing
n 2421
Hospital % (freq) St. Antonius, Nieuwegein 39.2 ( 948) 0.0
UMC Utrecht 60.8 (1473)
ORyear % (freq) No data available/missing 0.0 ( 0) 0.0
2002 3.3 ( 81)
2003 6.5 ( 157)
2004 7.8 ( 190)
2005 7.6 ( 185)
2006 7.6 ( 183)
2007 6.3 ( 152)
2008 5.7 ( 138)
2009 7.5 ( 181)
2010 6.6 ( 159)
2011 6.7 ( 163)
2012 7.3 ( 176)
2013 6.2 ( 149)
2014 6.7 ( 163)
2015 3.1 ( 76)
2016 3.5 ( 85)
2017 2.7 ( 65)
2018 2.7 ( 66)
2019 2.1 ( 52)
Age (mean (SD)) 69.105 (9.302) 0.0
Gender % (freq) female 30.5 ( 738) 0.0
male 69.5 (1683)
TC_final (mean (SD)) 4.786 (1.457) 38.0
LDL_final (mean (SD)) 2.808 (1.081) 45.6
HDL_final (mean (SD)) 1.203 (0.440) 41.7
TG_final (mean (SD)) 1.709 (1.031) 42.8
systolic (mean (SD)) 152.419 (25.166) 11.3
diastoli (mean (SD)) 81.318 (25.188) 11.3
GFR_MDRD (mean (SD)) 73.121 (21.152) 5.4
BMI (mean (SD)) 26.488 (3.977) 5.9
KDOQI % (freq) No data available/missing 0.0 ( 0) 5.5
Normal kidney function 19.1 ( 462)
CKD 2 (Mild) 50.9 (1232)
CKD 3 (Moderate) 22.8 ( 553)
CKD 4 (Severe) 1.3 ( 32)
CKD 5 (Failure) 0.4 ( 10)
<NA> 5.5 ( 132)
BMI_WHO % (freq) No data available/missing 0.0 ( 0) 5.9
Underweight 1.0 ( 24)
Normal 35.1 ( 850)
Overweight 43.4 (1051)
Obese 14.5 ( 352)
<NA> 5.9 ( 144)
SmokerStatus % (freq) Current smoker 33.2 ( 803) 5.9
Ex-smoker 48.0 (1163)
Never smoked 12.9 ( 313)
<NA> 5.9 ( 142)
AlcoholUse % (freq) No 34.5 ( 835) 4.0
Yes 61.5 (1488)
<NA> 4.0 ( 98)
DiabetesStatus % (freq) Control (no Diabetes Dx/Med) 75.2 (1820) 1.1
Diabetes 23.7 ( 574)
<NA> 1.1 ( 27)
Hypertension.selfreport % (freq) No data available/missing 0.0 ( 0) 3.2
no 24.3 ( 589)
yes 72.4 (1754)
<NA> 3.2 ( 78)
Hypertension.selfreportdrug % (freq) No data available/missing 0.0 ( 0) 4.4
no 29.9 ( 725)
yes 65.6 (1589)
<NA> 4.4 ( 107)
Hypertension.composite % (freq) No data available/missing 0.0 ( 0) 1.2
no 14.6 ( 353)
yes 84.3 (2040)
<NA> 1.2 ( 28)
Hypertension.drugs % (freq) No data available/missing 0.0 ( 0) 1.4
no 23.3 ( 565)
yes 75.3 (1823)
<NA> 1.4 ( 33)
Med.anticoagulants % (freq) No data available/missing 0.0 ( 0) 1.6
no 87.3 (2114)
yes 11.1 ( 269)
<NA> 1.6 ( 38)
Med.all.antiplatelet % (freq) No data available/missing 0.0 ( 0) 1.5
no 12.2 ( 295)
yes 86.3 (2090)
<NA> 1.5 ( 36)
Med.Statin.LLD % (freq) No data available/missing 0.0 ( 0) 1.4
no 20.3 ( 491)
yes 78.3 (1896)
<NA> 1.4 ( 34)
Stroke_Dx % (freq) Missing 0.0 ( 0) 6.9
No stroke diagnosed 71.5 (1731)
Stroke diagnosed 21.6 ( 524)
<NA> 6.9 ( 166)
sympt % (freq) missing 0.0 ( 0) 0.0
Asymptomatic 11.2 ( 270)
TIA 39.7 ( 961)
minor stroke 16.8 ( 407)
Major stroke 9.8 ( 238)
Amaurosis fugax 15.7 ( 379)
Four vessel disease 1.6 ( 38)
Vertebrobasilary TIA 0.2 ( 5)
Retinal infarction 1.4 ( 34)
Symptomatic, but aspecific symtoms 2.2 ( 53)
Contralateral symptomatic occlusion 0.5 ( 11)
retinal infarction 0.2 ( 6)
armclaudication due to occlusion subclavian artery, CEA needed for bypass 0.0 ( 1)
retinal infarction + TIAs 0.0 ( 0)
Ocular ischemic syndrome 0.7 ( 16)
ischemisch glaucoom 0.0 ( 0)
subclavian steal syndrome 0.1 ( 2)
TGA 0.0 ( 0)
Symptoms.5G % (freq) Asymptomatic 11.2 ( 270) 0.0
Ocular 16.3 ( 395)
Other 4.3 ( 105)
Retinal infarction 1.7 ( 40)
Stroke 26.6 ( 645)
TIA 39.9 ( 966)
AsymptSympt % (freq) Asymptomatic 11.2 ( 270) 0.0
Ocular and others 22.3 ( 540)
Symptomatic 66.5 (1611)
AsymptSympt2G % (freq) Asymptomatic 11.2 ( 270) 0.0
Symptomatic 88.8 (2151)
restenos % (freq) missing 0.0 ( 0) 1.4
de novo 93.7 (2268)
restenosis 4.9 ( 118)
stenose bij angioseal na PTCA 0.0 ( 0)
<NA> 1.4 ( 35)
stenose % (freq) missing 0.0 ( 0) 2.0
0-49% 0.5 ( 13)
50-70% 7.8 ( 189)
70-90% 46.6 (1127)
90-99% 38.3 ( 927)
100% (Occlusion) 1.3 ( 31)
NA 0.0 ( 1)
50-99% 0.6 ( 15)
70-99% 2.8 ( 68)
99 0.1 ( 2)
<NA> 2.0 ( 48)
CAD_history % (freq) Missing 0.0 ( 0) 1.9
No history CAD 66.8 (1618)
History CAD 31.2 ( 756)
<NA> 1.9 ( 47)
PAOD % (freq) missing/no data 0.0 ( 0) 2.0
no 77.5 (1876)
yes 20.5 ( 497)
<NA> 2.0 ( 48)
Peripheral.interv % (freq) no 77.2 (1868) 2.9
yes 19.9 ( 482)
<NA> 2.9 ( 71)
EP_composite % (freq) No data available. 0.0 ( 0) 5.0
No composite endpoints 70.6 (1709)
Composite endpoints 24.4 ( 590)
<NA> 5.0 ( 122)
EP_composite_time (mean (SD)) 2.479 (1.109) 5.2
EP_major % (freq) No data available. 0.0 ( 0) 5.0
No major events (endpoints) 83.3 (2016)
Major events (endpoints) 11.7 ( 283)
<NA> 5.0 ( 122)
EP_major_time (mean (SD)) 2.707 (0.977) 5.2
MAC_rankNorm (mean (SD)) 0.177 (0.952) 29.7
SMC_rankNorm (mean (SD)) -0.060 (0.962) 29.9
Macrophages.bin % (freq) no/minor 34.9 ( 846) 24.1
moderate/heavy 40.9 ( 991)
<NA> 24.1 ( 584)
SMC.bin % (freq) no/minor 24.9 ( 602) 23.8
moderate/heavy 51.3 (1242)
<NA> 23.8 ( 577)
Neutrophils_rankNorm (mean (SD)) 0.030 (0.951) 87.4
MastCells_rankNorm (mean (SD)) -0.010 (1.002) 90.0
IPH.bin % (freq) no 30.7 ( 744) 23.5
yes 45.8 (1108)
<NA> 23.5 ( 569)
VesselDensity_rankNorm (mean (SD)) 0.056 (0.981) 35.1
Calc.bin % (freq) no/minor 41.6 (1006) 23.4
moderate/heavy 35.1 ( 849)
<NA> 23.4 ( 566)
Collagen.bin % (freq) no/minor 15.8 ( 382) 23.6
moderate/heavy 60.6 (1467)
<NA> 23.6 ( 572)
Fat.bin_10 % (freq) <10% 22.4 ( 542) 23.3
>10% 54.3 (1314)
<NA> 23.3 ( 565)
Fat.bin_40 % (freq) <40% 56.2 (1360) 23.3
>40% 20.5 ( 496)
<NA> 23.3 ( 565)
OverallPlaquePhenotype % (freq) atheromatous 19.8 ( 480) 23.7
fibroatheromatous 27.8 ( 672)
fibrous 28.7 ( 695)
<NA> 23.7 ( 574)
Plaque_Vulnerability_Index % (freq) 0 29.5 ( 713) 0.0
1 14.3 ( 347)
2 19.7 ( 478)
3 22.1 ( 535)
4 10.4 ( 251)
5 4.0 ( 97)
Plasma_PCSK9 (mean (SD)) 31967.258 (18888.894) 54.8
Plasma_PCSK9_rankNorm (mean (SD)) -0.007 (1.016) 54.8
AEDB.CEA.Sex.tableOne = print(CreateTableOne(vars = basetable_vars,
# factorVars = basetable_bin,
strata = "Gender",
data = AEDB.CEA, includeNA = FALSE,
test = TRUE, addOverall = TRUE),
nonnormal = c(), missing = TRUE,
quote = FALSE, noSpaces = FALSE, showAllLevels = TRUE, explain = TRUE,
format = "pf",
contDigits = 3)[,1:6]Let’s save the baseline characteristics of the Athero-Express Biobank Study.
# Write basetable
require(openxlsx)
write.xlsx(file = paste0(BASELINE_loc, "/",Today,".",PROJECTNAME,".AE.BaselineTable.xlsx"),
AEDB.tableOne,
rowNames = TRUE,
colNames = TRUE,
sheetName = "AE_Base", overwrite = TRUE)
write.xlsx(file = paste0(BASELINE_loc, "/",Today,".",PROJECTNAME,".AE.CEA.BaselineTable.xlsx"),
AEDB.CEA.tableOne,
rowNames = TRUE,
colNames = TRUE,
sheetName = "AE_Base_CEA", overwrite = TRUE)
write.xlsx(file = paste0(BASELINE_loc, "/",Today,".",PROJECTNAME,".AE.CEA.Sex.BaselineTable.xlsx"),
AEDB.CEA.tableOne,
rowNames = TRUE,
colNames = TRUE,
sheetName = "AE_Base_CEA_sex", overwrite = TRUE)Let’s combine the full Athero-Express Biobank Study with the key-table containing the AEGS data.
NOTE: this should sum to 2,124 samples with genotypes.
AEGS <- merge(AEDB.full, AEGS123.sampleList.keytable, by.x = "STUDY_NUMBER", by.y = "STUDY_NUMBER", sort = FALSE,
all = TRUE)
dim(AEGS)[1] 3571 1137
AEGS$UPID.y <- NULL
names(AEGS)[names(AEGS) == "UPID.x"] <- "UPID"
AEGS$Age.y <- NULL
names(AEGS)[names(AEGS) == "Age.x"] <- "Age"
table(AEGS$CHIP, useNA = "ifany")
AffyAxiomCEU AffySNP5 IllGSA <NA>
918 687 519 1447
AEGS$GWAS <- AEGS$CHIP
AEGS$GWAS[is.na(AEGS$GWAS)] <- "not genotyped"
AEGS$GWAS[AEGS$GWAS != "not genotyped"] <- "genotyped"
table(AEGS$CHIP, AEGS$GWAS, useNA = "ifany")
genotyped not genotyped
AffyAxiomCEU 918 0
AffySNP5 687 0
IllGSA 519 0
<NA> 0 1447
Also a visualisation of the AEGS with AEDB overlaps.
library(UpSetR)
require(ggplot2)
require(plyr)
require(gridExtra)
require(grid)
AEDB.availGWAS = list(
AEGS1 = subset(AEGS, CHIP == "AffySNP5", select = c("STUDY_NUMBER"))[,1],
AEGS2 = subset(AEGS, CHIP == "AffyAxiomCEU", select = c("STUDY_NUMBER"))[,1],
AEGS3 = subset(AEGS, CHIP == "IllGSA", select = c("STUDY_NUMBER"))[,1],
AEDB = AEGS$STUDY_NUMBER)
p1 <- UpSetR::upset(fromList(AEDB.availGWAS),
sets = c("AEDB", "AEGS1", "AEGS2", "AEGS3"),
main.bar.color = c(uithof_color[15], uithof_color[2], uithof_color[3], uithof_color[21]),
mainbar.y.label = "intersection sample size",
sets.bar.color = c(uithof_color[15], uithof_color[2], uithof_color[3], uithof_color[21]),
sets.x.label = "sample size", keep.order = TRUE)
pdf(paste0(PLOT_loc, "/", Today, ".overlap.AEDB_AEGS123.UpSetR.pdf"))
p1
dev.off()quartz_off_screen
2
png(paste0(PLOT_loc, "/", Today, ".overlap.AEDB_AEGS123.UpSetR.png"))
p1dev.off()quartz_off_screen
2
p1rm(p1)table(AEGS$Artery_summary, AEGS$QC2018_FILTER)
family_discard family_keep issue passed
No artery known (yet), no surgery (patient ill, died, exited study), re-numbered to AAA 0 0 0 0
carotid (left & right) 20 21 40 1819
femoral/iliac (left, right or both sides) 1 0 0 99
other carotid arteries (common, external) 0 0 0 9
carotid bypass and injury (left, right or both sides) 0 0 0 1
aneurysmata (carotid & femoral) 0 0 0 0
aorta 0 0 0 0
other arteries (renal, popliteal, vertebral) 0 0 0 1
femoral bypass, angioseal and injury (left, right or both sides) 0 0 0 0
table(AEGS$informedconsent, AEGS$QC2018_FILTER)
family_discard family_keep issue passed
missing 0 0 0 0
no, died 0 0 0 0
yes 17 16 24 1411
yes, health treatment when possible 2 2 10 312
yes, no health treatment 2 2 2 91
yes, no health treatment, no commercial business 0 0 0 15
yes, no tissue, no commerical business 0 0 0 0
yes, no tissue, no questionnaires, no medical info, no commercial business 0 0 0 0
yes, no questionnaires, no health treatment, no commercial business 0 0 0 1
yes, no questionnaires, health treatment when possible 0 0 0 2
yes, no tissue, no questionnaires, no health treatment, no commerical business 0 0 0 0
yes, no health treatment, no medical info, no commercial business 0 0 3 10
yes, no tissue, no questionnaires, no health treatment, no medical info, no commercial business 0 0 0 0
yes, no questionnaires, no health treatment 0 0 0 0
yes, no tissue, no health treatment 0 0 0 0
yes, no tissue, no questionnaires 0 0 0 0
yes, no tissue, health treatment when possible 0 0 0 0
yes, no tissue 0 0 0 0
yes, no commerical business 0 1 0 35
yes, health treatment when possible, no commercial business 0 0 0 25
yes, no medical info, no commercial business 0 0 0 4
yes, no questionnaires 0 0 0 1
yes, no tissue, no questionnaires, no health treatment, no medical info 0 0 0 0
yes, no tissue, no questionnaires, no health treatment, no commercial business 0 0 0 0
yes, no medical info 0 0 1 5
yes, no questionnaires, no commercial business 0 0 0 0
yes, no questionnaires, no health treatment, no medical info 0 0 0 1
yes, no questionnaires, health treatment when possible, no commercial business 0 0 0 0
yes, no health treatment, no medical info 0 0 0 5
no, doesn't want to 0 0 0 0
no, unable to sign 0 0 0 0
no, no reaction 0 0 0 0
no, lost 0 0 0 0
no, too old 0 0 0 0
yes, no medical info, health treatment when possible 0 0 0 0
no (never asked for IC because there was no tissue) 0 0 0 0
yes, no medical info, no commercial business, health treatment when possible 0 0 0 2
no, endpoint 0 0 0 0
wil niets invullen, wel alles gebruiken 0 0 0 7
second informed concents: yes, no commercial business 0 0 0 2
nooit geincludeerd 0 0 0 0
AEGSselect <- subset(AEGS,
informedconsent != "missing" &
informedconsent != "no, died" &
informedconsent != "yes, no tissue, no commerical business" &
informedconsent != "yes, no tissue, no questionnaires, no medical info, no commercial business" &
informedconsent != "yes, no tissue, no questionnaires, no health treatment, no commerical business" &
informedconsent != "yes, no tissue, no questionnaires, no health treatment, no medical info, no commercial business" &
informedconsent != "yes, no tissue, no health treatment" &
informedconsent != "yes, no tissue, no questionnaires" &
informedconsent != "yes, no tissue, health treatment when possible" &
informedconsent != "yes, no tissue" &
informedconsent != "yes, no tissue, no questionnaires, no health treatment, no medical info" &
informedconsent != "yes, no tissue, no questionnaires, no health treatment, no commercial business" &
informedconsent != "no, doesn't want to" &
informedconsent != "no, unable to sign" &
informedconsent != "no, no reaction" &
informedconsent != "no, lost" &
informedconsent != "no, too old" &
informedconsent != "yes, no medical info, health treatment when possible" &
informedconsent != "no (never asked for IC because there was no tissue)" &
informedconsent != "no, endpoint" &
informedconsent != "nooit geincludeerd")
AEGSselect.CEA <- subset(AEGS, !is.na(QC2018_FILTER) & QC2018_FILTER != "issue" & QC2018_FILTER != "family_discard" &
(Artery_summary == "carotid (left & right)" | Artery_summary == "other carotid arteries (common, external)") & # we only want carotids
informedconsent != "missing" &
informedconsent != "no, died" &
informedconsent != "yes, no tissue, no commerical business" &
informedconsent != "yes, no tissue, no questionnaires, no medical info, no commercial business" &
informedconsent != "yes, no tissue, no questionnaires, no health treatment, no commerical business" &
informedconsent != "yes, no tissue, no questionnaires, no health treatment, no medical info, no commercial business" &
informedconsent != "yes, no tissue, no health treatment" &
informedconsent != "yes, no tissue, no questionnaires" &
informedconsent != "yes, no tissue, health treatment when possible" &
informedconsent != "yes, no tissue" &
informedconsent != "yes, no tissue, no questionnaires, no health treatment, no medical info" &
informedconsent != "yes, no tissue, no questionnaires, no health treatment, no commercial business" &
informedconsent != "no, doesn't want to" &
informedconsent != "no, unable to sign" &
informedconsent != "no, no reaction" &
informedconsent != "no, lost" &
informedconsent != "no, too old" &
informedconsent != "yes, no medical info, health treatment when possible" &
informedconsent != "no (never asked for IC because there was no tissue)" &
informedconsent != "no, endpoint" &
informedconsent != "nooit geincludeerd")
dim(AEGSselect)[1] 3458 1136
table(AEGSselect$Artery_summary, AEGSselect$QC2018_FILTER)
family_discard family_keep issue passed
No artery known (yet), no surgery (patient ill, died, exited study), re-numbered to AAA 0 0 0 0
carotid (left & right) 20 21 40 1819
femoral/iliac (left, right or both sides) 1 0 0 99
other carotid arteries (common, external) 0 0 0 9
carotid bypass and injury (left, right or both sides) 0 0 0 1
aneurysmata (carotid & femoral) 0 0 0 0
aorta 0 0 0 0
other arteries (renal, popliteal, vertebral) 0 0 0 1
femoral bypass, angioseal and injury (left, right or both sides) 0 0 0 0
table(AEGSselect$Artery_summary, AEGSselect$CHIP)
AffyAxiomCEU AffySNP5 IllGSA
No artery known (yet), no surgery (patient ill, died, exited study), re-numbered to AAA 0 0 0
carotid (left & right) 863 550 487
femoral/iliac (left, right or both sides) 0 72 28
other carotid arteries (common, external) 2 5 2
carotid bypass and injury (left, right or both sides) 0 1 0
aneurysmata (carotid & femoral) 0 0 0
aorta 0 0 0
other arteries (renal, popliteal, vertebral) 0 1 0
femoral bypass, angioseal and injury (left, right or both sides) 0 0 0
table(AEGSselect$QC2018_FILTER, AEGSselect$CHIP)
AffyAxiomCEU AffySNP5 IllGSA
family_discard 12 6 3
family_keep 8 1 12
issue 37 1 2
passed 808 621 500
table(AEGSselect$QC2018_FILTER, AEGSselect$SAMPLE_TYPE)
EDTA blood plaque unknown
family_discard 15 6 0
family_keep 13 8 0
issue 26 14 0
passed 1199 729 1
AEDB.temp <- subset(AEGSselect, select = c("STUDY_NUMBER", "UPID", "Age", "Gender", "Hospital", "Artery_summary", "QC2018_FILTER", "CHIP", "SAMPLE_TYPE"))
require(labelled)
AEDB.temp$Gender <- to_factor(AEDB.temp$Gender)
AEDB.temp$Hospital <- to_factor(AEDB.temp$Hospital)
AEDB.temp$Artery_summary <- to_factor(AEDB.temp$Artery_summary)
AEDB.temp$QC2018_FILTER <- to_factor(AEDB.temp$QC2018_FILTER)
AEDB.temp$CHIP <- to_factor(AEDB.temp$CHIP)
AEDB.temp$SAMPLE_TYPE <- to_factor(AEDB.temp$SAMPLE_TYPE)
DT::datatable(AEDB.temp[1:10,], caption = "Excerpt of the whole AEDB.", rownames = FALSE)
rm(AEDB.temp)Showing the baseline table of the Athero-Express Genomics Study.
# Create baseline tables
# http://rstudio-pubs-static.s3.amazonaws.com/13321_da314633db924dc78986a850813a50d5.html
AEGSselect$GWAS <- to_factor(AEGSselect$GWAS)
AEGSselect$CHIP <- to_factor(AEGSselect$CHIP)
AEGSselect$PCA <- to_factor(AEGSselect$PCA)
AEGSselect$SAMPLE_TYPE <- to_factor(AEGSselect$SAMPLE_TYPE)
AEGSselect$informedconsent <- to_factor(AEGSselect$informedconsent)
AEGSselect.CEA$GWAS <- to_factor(AEGSselect.CEA$GWAS)
AEGSselect.CEA$CHIP <- to_factor(AEGSselect.CEA$CHIP)
AEGSselect.CEA$PCA <- to_factor(AEGSselect.CEA$PCA)
AEGSselect.CEA$SAMPLE_TYPE <- to_factor(AEGSselect.CEA$SAMPLE_TYPE)
AEGSselect.CEA$informedconsent <- to_factor(AEGSselect.CEA$informedconsent)
cat("===========================================================================================\n")===========================================================================================
cat("CREATE BASELINE TABLE\n")CREATE BASELINE TABLE
# Baseline table variables
basetable_vars = c("Hospital",
"Age", "Gender",
"TC_final", "LDL_final", "HDL_final", "TG_final",
"systolic", "diastoli", "GFR_MDRD", "BMI",
"KDOQI", "BMI_WHO",
"SmokerCurrent", "eCigarettes", "ePackYearsSmoking",
"DiabetesStatus", "Hypertension.selfreport", "Hypertension.selfreportdrug", "Hypertension.composite",
"Hypertension.drugs", "Med.anticoagulants", "Med.all.antiplatelet", "Med.Statin.LLD",
"Stroke_Dx", "sympt", "Symptoms.5G", "restenos",
"EP_composite", "EP_composite_time",
"macmean0", "smcmean0", "Macrophages.bin", "SMC.bin", "neutrophils", "Mast_cells_plaque", "vessel_density_averaged",
"IPH.bin",
"Calc.bin", "Collagen.bin",
"Fat.bin_10", "Fat.bin_40", "OverallPlaquePhenotype", "Plaque_Vulnerability_Index",
"SMC_rankNorm", "MAC_rankNorm", "Neutrophils_rankNorm", "MastCells_rankNorm", "VesselDensity_rankNorm",
"Plasma_PCSK9", "Plasma_PCSK9_rankNorm",
"GWAS", "CHIP", "PCA")
basetable_bin = c("Gender",
"KDOQI", "BMI_WHO",
"SmokerCurrent",
"DiabetesStatus", "Hypertension.selfreport", "Hypertension.selfreportdrug", "Hypertension.composite",
"Hypertension.drugs", "Med.anticoagulants", "Med.all.antiplatelet", "Med.Statin.LLD",
"Stroke_Dx", "sympt", "Symptoms.5G", "restenos",
"EP_composite", "Macrophages.bin", "SMC.bin",
"IPH.bin",
"Calc.bin", "Collagen.bin",
"Fat.bin_10", "Fat.bin_40", "OverallPlaquePhenotype", "Plaque_Vulnerability_Index",
"GWAS", "CHIP", "PCA")
basetable_bin [1] "Gender" "KDOQI" "BMI_WHO" "SmokerCurrent" "DiabetesStatus"
[6] "Hypertension.selfreport" "Hypertension.selfreportdrug" "Hypertension.composite" "Hypertension.drugs" "Med.anticoagulants"
[11] "Med.all.antiplatelet" "Med.Statin.LLD" "Stroke_Dx" "sympt" "Symptoms.5G"
[16] "restenos" "EP_composite" "Macrophages.bin" "SMC.bin" "IPH.bin"
[21] "Calc.bin" "Collagen.bin" "Fat.bin_10" "Fat.bin_40" "OverallPlaquePhenotype"
[26] "Plaque_Vulnerability_Index" "GWAS" "CHIP" "PCA"
basetable_con = basetable_vars[!basetable_vars %in% basetable_bin]
basetable_con [1] "Hospital" "Age" "TC_final" "LDL_final" "HDL_final" "TG_final"
[7] "systolic" "diastoli" "GFR_MDRD" "BMI" "eCigarettes" "ePackYearsSmoking"
[13] "EP_composite_time" "macmean0" "smcmean0" "neutrophils" "Mast_cells_plaque" "vessel_density_averaged"
[19] "SMC_rankNorm" "MAC_rankNorm" "Neutrophils_rankNorm" "MastCells_rankNorm" "VesselDensity_rankNorm" "Plasma_PCSK9"
[25] "Plasma_PCSK9_rankNorm"
All Athero-Express Genomics Study data (n = 2,011), compared to the remaining, _un_genotyped Athero-Express Biobank Study.
cat("\n===========================================================================================\n")
cat("DISPLAY BASELINE TABLE\n")
AEGSselect.tableOne = print(CreateTableOne(vars = basetable_vars,
# factorVars = basetable_bin,
strata = "GWAS",
data = AEGSselect, includeNA = TRUE),
nonnormal = c(), missing = TRUE,
quote = FALSE, noSpaces = FALSE, showAllLevels = TRUE, explain = TRUE,
format = "pf",
contDigits = 3)[,1:6]Baseline of the valid, CEA and genotyped data.
AEGSselect.CEA.tableOne = print(CreateTableOne(vars = basetable_vars,
# factorVars = basetable_bin,
strata = "Gender",
data = AEGSselect.CEA, includeNA = TRUE),
nonnormal = c(), missing = TRUE,
quote = FALSE, noSpaces = FALSE, showAllLevels = TRUE, explain = TRUE,
format = "pf",
contDigits = 3)[,1:6]Let’s save the baseline characteristics of the Athero-Express Genomics Study.
# Write basetable
require(openxlsx)
write.xlsx(file = paste0(BASELINE_loc, "/",Today,".",PROJECTNAME,".AEGS.BaselineTable.xlsx"),
AEGSselect.tableOne,
rowNames = TRUE,
colNames = TRUE,
sheetName = "AEGS_Base_AEDB", overwrite = TRUE)
write.xlsx(file = paste0(BASELINE_loc, "/",Today,".",PROJECTNAME,".AEGS.CEA.BaselineTable.xlsx"),
AEGSselect.CEA.tableOne,
rowNames = TRUE,
colNames = TRUE,
sheetName = "AEGS_Base_CEA_sex", overwrite = TRUE)We are ready to make a sampleList for use with the imputed data.
require(openxlsx)
temp <- subset(AEGS,
GWAS == "genotyped",
select = c("ID_1", "ID_2", "UPID", "STUDY_NUMBER", # ID_2 is the order of samples!
"QC2018_FINAL", "QC2018_FILTER", "OriginalOrder_postMichImp_QC",
"AEGS_type", "CHIP", "STUDY_TYPE", "SAMPLE_TYPE", "PCA",
"PC1", "PC2", "PC3", "PC4", "PC5",
"PC6", "PC7", "PC8", "PC9", "PC10",
"Sex", "Age", "ORyear",
"BMI",
"Calc.bin", "Collagen.bin",
"Fat.bin_10", "Fat.bin_40", "IPH.bin",
"SMC_rankNorm", "MAC_rankNorm", "Neutrophils_rankNorm", "MastCells_rankNorm", "VesselDensity_rankNorm",
"OverallPlaquePhenotype", "Plaque_Vulnerability_Index",
"Plasma_PCSK9", "Plasma_PCSK9_rankNorm")) # Select some phenotype of interest
dim(temp)[1] 2124 40
# Fix things
attach(temp)
temp[,"Calcification"] <- NA
temp$Calcification[Calc.bin == "no/minor"] <- "control"
temp$Calcification[Calc.bin == "moderate/heavy"] <- "case"
temp[,"Collagen"] <- NA
temp$Collagen[Collagen.bin == "no/minor"] <- "control"
temp$Collagen[Collagen.bin == "moderate/heavy"] <- "case"
temp[,"Fat10"] <- NA
temp$Fat10[Fat.bin_10 == "<10%"] <- "control"
temp$Fat10[Fat.bin_10 == ">10%"] <- "case"
temp[,"Fat40"] <- NA
temp$Fat40[Fat.bin_40 == "<40%"] <- "control"
temp$Fat40[Fat.bin_40 == ">40%"] <- "case"
temp[,"IPH"] <- NA
temp$IPH[IPH.bin == "no"] <- "control"
temp$IPH[IPH.bin == "yes"] <- "case"
temp$Plasma_PCSK9_C <- temp$Plasma_PCSK9
temp$Plasma_PCSK9_rankNorm_C <- temp$Plasma_PCSK9_rankNorm
detach(temp)
# Making selection variable
attach(temp)
temp[,"SELECTION"] <- "not_selected"
temp$SELECTION[(QC2018_FILTER=="passed" | QC2018_FILTER=="family_keep") & (STUDY_TYPE=="CEA" & PCA=="EUR")] <- "selected"
detach(temp)
table(temp$SELECTION, temp$QC2018_FILTER)
family_discard family_keep issue passed
not_selected 23 0 40 153
selected 0 21 0 1887
table(temp$SELECTION, temp$STUDY_TYPE)
CEA FEA Other
not_selected 94 109 12
selected 1908 0 0
table(temp$SELECTION, temp$PCA)
EUR nonEUR
not_selected 167 45
selected 1908 0
AEGS123_sample.list <- temp[order(temp$OriginalOrder_postMichImp_QC),]
AEGS123_sample.list$missing <- 0
sample_file_aegs <- dplyr::select(AEGS123_sample.list,
ID_1, ID_2, missing, # ID_2 is the order of samples - that way we always know what the order should be
UPID, STUDY_NUMBER,
QC2018_FINAL, QC2018_FILTER, SELECTION,
AEGS_type, CHIP, STUDY_TYPE, SAMPLE_TYPE, PCA,
PC1, PC2, PC3, PC4, PC5, PC6, PC7, PC8, PC9, PC10,
Sex, Age, ORyear,
BMI, Plasma_PCSK9_C, Plasma_PCSK9_rankNorm_C,
Calcification, Collagen,
Fat10, Fat40, IPH,
SMC_rankNorm, MAC_rankNorm, Neutrophils_rankNorm, MastCells_rankNorm, VesselDensity_rankNorm,
OverallPlaquePhenotype, Plaque_Vulnerability_Index,
Plasma_PCSK9, Plasma_PCSK9_rankNorm) %>%
mutate_if(is.numeric, as.character) %>%
mutate(SAMPLE_TYPE = gsub(' ', '_', SAMPLE_TYPE)) %>%
add_row(.before = 1,
ID_1 = "0", ID_2 = "0", missing = "0",
UPID = "D", STUDY_NUMBER = "C",
QC2018_FINAL = "D", QC2018_FILTER = "D", SELECTION = "D",
AEGS_type = "D", CHIP = "D", STUDY_TYPE = "D", SAMPLE_TYPE = "D", PCA = "D",
PC1 = "C", PC2 = "C", PC3 = "C", PC4 = "C", PC5 = "C", PC6 = "C", PC7 = "C", PC8 = "C", PC9 = "C", PC10 = "C",
Sex = "D", Age = "C", ORyear = "C",
BMI = "C", Plasma_PCSK9_C = "C", Plasma_PCSK9_rankNorm_C = "C",
Calcification = "B", Collagen = "B",
Fat10 = "B", Fat40 = "B", IPH = "B",
SMC_rankNorm = "P", MAC_rankNorm = "P", Neutrophils_rankNorm = "P", MastCells_rankNorm = "P", VesselDensity_rankNorm = "P",
OverallPlaquePhenotype = "P", Plaque_Vulnerability_Index = "P",
Plasma_PCSK9 = "P", Plasma_PCSK9_rankNorm = "P") %>% ## identifiers: index for these is 1, and all base variables have 0 as identifier
print()dim(sample_file_aegs)[1] 2125 43
fwrite(sample_file_aegs,
file = paste0(SNP_loc, "/",Today,".",PROJECTNAME,".AEGS123.sample"),
na = "NA", sep = "\t", quote = FALSE,
row.names = FALSE, col.names = TRUE,
showProgress = TRUE, verbose = TRUE)This installation of data.table has not been compiled with OpenMP support.
omp_get_num_procs() 1
R_DATATABLE_NUM_PROCS_PERCENT unset (default 50)
R_DATATABLE_NUM_THREADS unset
R_DATATABLE_THROTTLE unset (default 1024)
omp_get_thread_limit() 1
omp_get_max_threads() 1
OMP_THREAD_LIMIT unset
OMP_NUM_THREADS unset
RestoreAfterFork true
data.table is using 1 threads with throttle==1024. See ?setDTthreads.
No list columns are present. Setting sep2='' otherwise quote='auto' would quote fields containing sep2.
Column writers: 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12
args.doRowNames=0 args.rowNames=0 doQuote=0 args.nrow=2125 args.ncol=43 eolLen=1
maxLineLen=1042. Found in 0.001s
Writing bom (false), yaml (0 characters) and column names (true) ... done in 0.003s
Writing 2125 rows in 1 batches of 2125 rows (each buffer size 8MB, showProgress=1, nth=1)
require(DT)
DT::datatable(sample_file_aegs, caption = "AEGS: final sample list of genotyped AE patients after quality control.", rownames = FALSE)Warning in instance$preRenderHook(instance) :
It seems your data is too big for client-side DataTables. You may consider server-side processing: https://rstudio.github.io/DT/server.html
Warning in instance$preRenderHook(instance) :
It seems your data is too big for client-side DataTables. You may consider server-side processing: https://rstudio.github.io/DT/server.html
rm(temp)This is the selection for females only.
require(openxlsx)
temp <- subset(AEGS,
GWAS == "genotyped",
select = c("ID_1", "ID_2", "UPID", "STUDY_NUMBER", # ID_2 is the order of samples!
"QC2018_FINAL", "QC2018_FILTER", "OriginalOrder_postMichImp_QC",
"AEGS_type", "CHIP", "STUDY_TYPE", "SAMPLE_TYPE", "PCA",
"PC1", "PC2", "PC3", "PC4", "PC5",
"PC6", "PC7", "PC8", "PC9", "PC10",
"Sex", "Age", "ORyear",
"BMI",
"Calc.bin", "Collagen.bin",
"Fat.bin_10", "Fat.bin_40", "IPH.bin",
"SMC_rankNorm", "MAC_rankNorm", "Neutrophils_rankNorm", "MastCells_rankNorm", "VesselDensity_rankNorm",
"OverallPlaquePhenotype", "Plaque_Vulnerability_Index",
"Plasma_PCSK9", "Plasma_PCSK9_rankNorm")) # Select some phenotype of interest
dim(temp)[1] 2124 40
# Fix things
attach(temp)
temp[,"Calcification"] <- NA
temp$Calcification[Calc.bin == "no/minor"] <- "control"
temp$Calcification[Calc.bin == "moderate/heavy"] <- "case"
temp[,"Collagen"] <- NA
temp$Collagen[Collagen.bin == "no/minor"] <- "control"
temp$Collagen[Collagen.bin == "moderate/heavy"] <- "case"
temp[,"Fat10"] <- NA
temp$Fat10[Fat.bin_10 == "<10%"] <- "control"
temp$Fat10[Fat.bin_10 == ">10%"] <- "case"
temp[,"Fat40"] <- NA
temp$Fat40[Fat.bin_40 == "<40%"] <- "control"
temp$Fat40[Fat.bin_40 == ">40%"] <- "case"
temp[,"IPH"] <- NA
temp$IPH[IPH.bin == "no"] <- "control"
temp$IPH[IPH.bin == "yes"] <- "case"
temp$Plasma_PCSK9_C <- temp$Plasma_PCSK9
temp$Plasma_PCSK9_rankNorm_C <- temp$Plasma_PCSK9_rankNorm
detach(temp)
# Making selection variable
attach(temp)
temp[,"SELECTION"] <- "not_selected"
temp$SELECTION[(QC2018_FILTER=="passed" | QC2018_FILTER=="family_keep") & (STUDY_TYPE=="CEA" & PCA=="EUR") & Sex=="F"] <- "selected"
detach(temp)
table(temp$SELECTION, temp$QC2018_FILTER)
family_discard family_keep issue passed
not_selected 23 15 40 1428
selected 0 6 0 612
table(temp$SELECTION, temp$STUDY_TYPE)
CEA FEA Other
not_selected 1384 109 12
selected 618 0 0
table(temp$SELECTION, temp$PCA)
EUR nonEUR
not_selected 1457 45
selected 618 0
AEGS123_sample.list <- temp[order(temp$OriginalOrder_postMichImp_QC),]
AEGS123_sample.list$missing <- 0
sample_file_aegsF <- dplyr::select(AEGS123_sample.list,
ID_1, ID_2, missing, # ID_2 is the order of samples - that way we always know what the order should be
UPID, STUDY_NUMBER,
QC2018_FINAL, QC2018_FILTER, SELECTION,
AEGS_type, CHIP, STUDY_TYPE, SAMPLE_TYPE, PCA,
PC1, PC2, PC3, PC4, PC5, PC6, PC7, PC8, PC9, PC10,
Sex, Age, ORyear,
BMI, Plasma_PCSK9_C, Plasma_PCSK9_rankNorm_C,
Calcification, Collagen,
Fat10, Fat40, IPH,
SMC_rankNorm, MAC_rankNorm, Neutrophils_rankNorm, MastCells_rankNorm, VesselDensity_rankNorm,
OverallPlaquePhenotype, Plaque_Vulnerability_Index,
Plasma_PCSK9, Plasma_PCSK9_rankNorm) %>%
mutate_if(is.numeric, as.character) %>%
mutate(SAMPLE_TYPE = gsub(' ', '_', SAMPLE_TYPE)) %>%
add_row(.before = 1,
ID_1 = "0", ID_2 = "0", missing = "0",
UPID = "D", STUDY_NUMBER = "C",
QC2018_FINAL = "D", QC2018_FILTER = "D", SELECTION = "D",
AEGS_type = "D", CHIP = "D", STUDY_TYPE = "D", SAMPLE_TYPE = "D", PCA = "D",
PC1 = "C", PC2 = "C", PC3 = "C", PC4 = "C", PC5 = "C", PC6 = "C", PC7 = "C", PC8 = "C", PC9 = "C", PC10 = "C",
Sex = "D", Age = "C", ORyear = "C",
BMI = "C", Plasma_PCSK9_C = "C", Plasma_PCSK9_rankNorm_C = "C",
Calcification = "B", Collagen = "B",
Fat10 = "B", Fat40 = "B", IPH = "B",
SMC_rankNorm = "P", MAC_rankNorm = "P", Neutrophils_rankNorm = "P", MastCells_rankNorm = "P", VesselDensity_rankNorm = "P",
OverallPlaquePhenotype = "P", Plaque_Vulnerability_Index = "P",
Plasma_PCSK9 = "P", Plasma_PCSK9_rankNorm = "P") %>% ## identifiers: index for these is 1, and all base variables have 0 as identifier
print()dim(sample_file_aegsF)[1] 2125 43
fwrite(sample_file_aegsF,
file = paste0(SNP_loc, "/",Today,".",PROJECTNAME,".AEGS123.females.sample"),
na = "NA", sep = "\t", quote = FALSE,
row.names = FALSE, col.names = TRUE,
showProgress = TRUE, verbose = TRUE)This installation of data.table has not been compiled with OpenMP support.
omp_get_num_procs() 1
R_DATATABLE_NUM_PROCS_PERCENT unset (default 50)
R_DATATABLE_NUM_THREADS unset
R_DATATABLE_THROTTLE unset (default 1024)
omp_get_thread_limit() 1
omp_get_max_threads() 1
OMP_THREAD_LIMIT unset
OMP_NUM_THREADS unset
RestoreAfterFork true
data.table is using 1 threads with throttle==1024. See ?setDTthreads.
No list columns are present. Setting sep2='' otherwise quote='auto' would quote fields containing sep2.
Column writers: 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12
args.doRowNames=0 args.rowNames=0 doQuote=0 args.nrow=2125 args.ncol=43 eolLen=1
maxLineLen=1042. Found in 0.001s
Writing bom (false), yaml (0 characters) and column names (true) ... done in 0.002s
Writing 2125 rows in 1 batches of 2125 rows (each buffer size 8MB, showProgress=1, nth=1)
require(DT)
DT::datatable(sample_file_aegsF, caption = "AEGS: final sample list of genotyped AE patients after quality control.", rownames = FALSE)Warning in instance$preRenderHook(instance) :
It seems your data is too big for client-side DataTables. You may consider server-side processing: https://rstudio.github.io/DT/server.html
Warning in instance$preRenderHook(instance) :
It seems your data is too big for client-side DataTables. You may consider server-side processing: https://rstudio.github.io/DT/server.html
rm(temp)This is the selection for males only.
require(openxlsx)
temp <- subset(AEGS,
GWAS == "genotyped",
select = c("ID_1", "ID_2", "UPID", "STUDY_NUMBER", # ID_2 is the order of samples!
"QC2018_FINAL", "QC2018_FILTER", "OriginalOrder_postMichImp_QC",
"AEGS_type", "CHIP", "STUDY_TYPE", "SAMPLE_TYPE", "PCA",
"PC1", "PC2", "PC3", "PC4", "PC5",
"PC6", "PC7", "PC8", "PC9", "PC10",
"Sex", "Age", "ORyear",
"BMI",
"Calc.bin", "Collagen.bin",
"Fat.bin_10", "Fat.bin_40", "IPH.bin",
"SMC_rankNorm", "MAC_rankNorm", "Neutrophils_rankNorm", "MastCells_rankNorm", "VesselDensity_rankNorm",
"OverallPlaquePhenotype", "Plaque_Vulnerability_Index",
"Plasma_PCSK9", "Plasma_PCSK9_rankNorm")) # Select some phenotype of interest
dim(temp)[1] 2124 40
# Fix things
attach(temp)
temp[,"Calcification"] <- NA
temp$Calcification[Calc.bin == "no/minor"] <- "control"
temp$Calcification[Calc.bin == "moderate/heavy"] <- "case"
temp[,"Collagen"] <- NA
temp$Collagen[Collagen.bin == "no/minor"] <- "control"
temp$Collagen[Collagen.bin == "moderate/heavy"] <- "case"
temp[,"Fat10"] <- NA
temp$Fat10[Fat.bin_10 == "<10%"] <- "control"
temp$Fat10[Fat.bin_10 == ">10%"] <- "case"
temp[,"Fat40"] <- NA
temp$Fat40[Fat.bin_40 == "<40%"] <- "control"
temp$Fat40[Fat.bin_40 == ">40%"] <- "case"
temp[,"IPH"] <- NA
temp$IPH[IPH.bin == "no"] <- "control"
temp$IPH[IPH.bin == "yes"] <- "case"
temp$Plasma_PCSK9_C <- temp$Plasma_PCSK9
temp$Plasma_PCSK9_rankNorm_C <- temp$Plasma_PCSK9_rankNorm
detach(temp)
# Making selection variable
attach(temp)
temp[,"SELECTION"] <- "not_selected"
temp$SELECTION[(QC2018_FILTER=="passed" | QC2018_FILTER=="family_keep") & (STUDY_TYPE=="CEA" & PCA=="EUR") & Sex=="M"] <- "selected"
detach(temp)
table(temp$SELECTION, temp$QC2018_FILTER)
family_discard family_keep issue passed
not_selected 23 6 40 765
selected 0 15 0 1275
table(temp$SELECTION, temp$STUDY_TYPE)
CEA FEA Other
not_selected 712 109 12
selected 1290 0 0
table(temp$SELECTION, temp$PCA)
EUR nonEUR
not_selected 785 45
selected 1290 0
AEGS123_sample.list <- temp[order(temp$OriginalOrder_postMichImp_QC),]
AEGS123_sample.list$missing <- 0
sample_file_aegsM <- dplyr::select(AEGS123_sample.list,
ID_1, ID_2, missing, # ID_2 is the order of samples - that way we always know what the order should be
UPID, STUDY_NUMBER,
QC2018_FINAL, QC2018_FILTER, SELECTION,
AEGS_type, CHIP, STUDY_TYPE, SAMPLE_TYPE, PCA,
PC1, PC2, PC3, PC4, PC5, PC6, PC7, PC8, PC9, PC10,
Sex, Age, ORyear,
BMI, Plasma_PCSK9_C, Plasma_PCSK9_rankNorm_C,
Calcification, Collagen,
Fat10, Fat40, IPH,
SMC_rankNorm, MAC_rankNorm, Neutrophils_rankNorm, MastCells_rankNorm, VesselDensity_rankNorm,
OverallPlaquePhenotype, Plaque_Vulnerability_Index,
Plasma_PCSK9, Plasma_PCSK9_rankNorm) %>%
mutate_if(is.numeric, as.character) %>%
mutate(SAMPLE_TYPE = gsub(' ', '_', SAMPLE_TYPE)) %>%
add_row(.before = 1,
ID_1 = "0", ID_2 = "0", missing = "0",
UPID = "D", STUDY_NUMBER = "C",
QC2018_FINAL = "D", QC2018_FILTER = "D", SELECTION = "D",
AEGS_type = "D", CHIP = "D", STUDY_TYPE = "D", SAMPLE_TYPE = "D", PCA = "D",
PC1 = "C", PC2 = "C", PC3 = "C", PC4 = "C", PC5 = "C", PC6 = "C", PC7 = "C", PC8 = "C", PC9 = "C", PC10 = "C",
Sex = "D", Age = "C", ORyear = "C",
BMI = "C", Plasma_PCSK9_C = "C", Plasma_PCSK9_rankNorm_C = "C",
Calcification = "B", Collagen = "B",
Fat10 = "B", Fat40 = "B", IPH = "B",
SMC_rankNorm = "P", MAC_rankNorm = "P", Neutrophils_rankNorm = "P", MastCells_rankNorm = "P", VesselDensity_rankNorm = "P",
OverallPlaquePhenotype = "P", Plaque_Vulnerability_Index = "P",
Plasma_PCSK9 = "P", Plasma_PCSK9_rankNorm = "P") %>% ## identifiers: index for these is 1, and all base variables have 0 as identifier
print()dim(sample_file_aegsM)[1] 2125 43
fwrite(sample_file_aegsM,
file = paste0(SNP_loc, "/",Today,".",PROJECTNAME,".AEGS123.males.sample"),
na = "NA", sep = "\t", quote = FALSE,
row.names = FALSE, col.names = TRUE,
showProgress = TRUE, verbose = TRUE)This installation of data.table has not been compiled with OpenMP support.
omp_get_num_procs() 1
R_DATATABLE_NUM_PROCS_PERCENT unset (default 50)
R_DATATABLE_NUM_THREADS unset
R_DATATABLE_THROTTLE unset (default 1024)
omp_get_thread_limit() 1
omp_get_max_threads() 1
OMP_THREAD_LIMIT unset
OMP_NUM_THREADS unset
RestoreAfterFork true
data.table is using 1 threads with throttle==1024. See ?setDTthreads.
No list columns are present. Setting sep2='' otherwise quote='auto' would quote fields containing sep2.
Column writers: 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12
args.doRowNames=0 args.rowNames=0 doQuote=0 args.nrow=2125 args.ncol=43 eolLen=1
maxLineLen=1042. Found in 0.001s
Writing bom (false), yaml (0 characters) and column names (true) ... done in 0.002s
Writing 2125 rows in 1 batches of 2125 rows (each buffer size 8MB, showProgress=1, nth=1)
require(DT)
DT::datatable(sample_file_aegsM, caption = "AEGS: final sample list of genotyped AE patients after quality control.", rownames = FALSE)Warning in instance$preRenderHook(instance) :
It seems your data is too big for client-side DataTables. You may consider server-side processing: https://rstudio.github.io/DT/server.html
Warning in instance$preRenderHook(instance) :
It seems your data is too big for client-side DataTables. You may consider server-side processing: https://rstudio.github.io/DT/server.html
rm(temp)The X-chromosome data is taken from previously imputed data based on 1000G phase 3 (version 5) and GoNL5. For some reason, imputing on the Michigan Imputation Server was not successful (ACTION point).
Here we load in the sample files for the three datasets of the X chromosomal data. We should:
AEGS123_chrX <- fread(paste0(MICHIMP_loc, "/_chr23_1kg_gonl5/aegs.raw.1kg_gonl5.chr23.mappings.txt"))
names(AEGS123_chrX)[names(AEGS123_chrX) == "ID_1"] <- "SampleID_postImpChrX"
AEGS123_AllChr <- merge(AEGS123_chrX, sample_file_aegs, by.x = "SampleID_postMichImp", by.y = "ID_1",
all.x = TRUE,
sort = FALSE)
names(AEGS123_AllChr)[names(AEGS123_AllChr) == "ChrX_Order"] <- "ID_2" # this is the order of the chr X data!
names(AEGS123_AllChr)[names(AEGS123_AllChr) == "STUDY_TYPE.y"] <- "STUDY_TYPE" # this indicates the artery information
names(AEGS123_AllChr)[names(AEGS123_AllChr) == "SampleID_postMichImp"] <- "ID_1" # this should be the sampleID
AEGS123_AllChr$missing.x <- NULL
AEGS123_AllChr$missing.y <- NULL
AEGS123_AllChr$STUDY_TYPE.x <- NULL # this is useless, we use the STUDY_TYPE.y because this contains artery information
AEGS123_AllChr$ID_2.x <- NULL # we remove this because this is the order of the autosomal data.
AEGS123_AllChr$ID_2.y <- NULL # we remove this because this is the order of the autosomal data.
AEGS123_AllChr$UPID <- NULL # we remove this because this is the order of the autosomal data.
names(AEGS123_AllChr)[names(AEGS123_AllChr) == "FID_forQC"] <- "UPID"
dim(AEGS123_AllChr)[1] 2176 46
str(AEGS123_AllChr)Classes ‘data.table’ and 'data.frame': 2176 obs. of 46 variables:
$ ID_1 : chr "0" "UPID00126-UPID00126-A318-0034" "UPID01799-UPID01799-A318-0492" "UPID01890-UPID01890-A318-0151" ...
$ SampleID_postImpChrX : chr "0" "UPID00126-UPID00126-A318-0034" "UPID01799-UPID01799-A318-0492" "UPID01890-UPID01890-A318-0151" ...
$ MappingID : chr "0" "UPID00126-UPID00126-A318-0034" "UPID01799-UPID01799-A318-0492" "UPID01890-UPID01890-A318-0151" ...
$ UPID : chr "0" "UPID00126" "UPID01799" "UPID01890" ...
$ IID_forQC : chr "0" "UPID00126-UPID00126-A318-0034" "UPID01799-UPID01799-A318-0492" "UPID01890-UPID01890-A318-0151" ...
$ SampleID_postQC : chr "0" "UPID00126-UPID00126-A318-0034" "UPID01799-UPID01799-A318-0492" "UPID01890-UPID01890-A318-0151" ...
$ ID_2 : int 0 1 2 NA 3 4 5 6 NA 7 ...
$ STUDY_NUMBER : chr "C" "901" "1164" NA ...
$ QC2018_FINAL : chr "D" NA NA NA ...
$ QC2018_FILTER : chr "D" "passed" "passed" NA ...
$ SELECTION : chr "D" "selected" "selected" NA ...
$ AEGS_type : chr "D" "AEGS1" "AEGS1" NA ...
$ CHIP : chr "D" "AffySNP5" "AffySNP5" NA ...
$ STUDY_TYPE : chr "D" "CEA" "CEA" NA ...
$ SAMPLE_TYPE : chr "D" "plaque" "plaque" NA ...
$ PCA : chr "D" "EUR" "EUR" NA ...
$ PC1 : chr "C" "-0.0316" "-0.0131" NA ...
$ PC2 : chr "C" "0.0161" "-0.0069" NA ...
$ PC3 : chr "C" "-0.01" "0.0383" NA ...
$ PC4 : chr "C" "-0.0131" "-0.005" NA ...
$ PC5 : chr "C" "0.0059" "0.0078" NA ...
$ PC6 : chr "C" "-0.0283" "0.0243" NA ...
$ PC7 : chr "C" "-2e-04" "-0.018" NA ...
$ PC8 : chr "C" "0.0033" "0.0022" NA ...
$ PC9 : chr "C" "-0.0317" "-0.0154" NA ...
$ PC10 : chr "C" "-0.0054" "0.007" NA ...
$ Sex : chr "D" "F" "M" NA ...
$ Age : chr "C" NA NA NA ...
$ ORyear : chr "C" NA NA NA ...
$ BMI : chr "C" NA NA NA ...
$ Plasma_PCSK9_C : chr "C" NA NA NA ...
$ Plasma_PCSK9_rankNorm_C : chr "C" NA NA NA ...
$ Calcification : chr "B" NA NA NA ...
$ Collagen : chr "B" NA NA NA ...
$ Fat10 : chr "B" NA NA NA ...
$ Fat40 : chr "B" NA NA NA ...
$ IPH : chr "B" NA NA NA ...
$ SMC_rankNorm : chr "P" NA NA NA ...
$ MAC_rankNorm : chr "P" NA NA NA ...
$ Neutrophils_rankNorm : chr "P" NA NA NA ...
$ MastCells_rankNorm : chr "P" NA NA NA ...
$ VesselDensity_rankNorm : chr "P" NA NA NA ...
$ OverallPlaquePhenotype : chr "P" NA NA NA ...
$ Plaque_Vulnerability_Index: chr "P" NA NA NA ...
$ Plasma_PCSK9 : chr "P" NA NA NA ...
$ Plasma_PCSK9_rankNorm : chr "P" NA NA NA ...
- attr(*, ".internal.selfref")=<externalptr>
This seems fine, let’s filter; we can use this file to filter the genetic data. And we create another file to re-order the data.
AEGS123_AllChrQC <- subset(AEGS123_AllChr,
!is.na(QC2018_FILTER) & !is.na(ID_2),
select = c("ID_1", "ID_2", "UPID", "STUDY_NUMBER", "SampleID_postImpChrX",
"QC2018_FINAL", "QC2018_FILTER", "SELECTION",
"AEGS_type", "CHIP", "STUDY_TYPE", "SAMPLE_TYPE",
"PC1", "PC2", "PC3", "PC4", "PC5",
"PC6", "PC7", "PC8", "PC9", "PC10",
"Sex", "Age", "ORyear",
"BMI", "Plasma_PCSK9_C", "Plasma_PCSK9_rankNorm_C",
"Calcification", "Collagen",
"Fat10", "Fat40", "IPH",
"SMC_rankNorm", "MAC_rankNorm", "Neutrophils_rankNorm", "MastCells_rankNorm", "VesselDensity_rankNorm",
"OverallPlaquePhenotype", "Plaque_Vulnerability_Index",
"Plasma_PCSK9", "Plasma_PCSK9_rankNorm"))
AEGS123_AllChrQC_reorder <-AEGS123_AllChrQC[order(AEGS123_AllChrQC$ID_2),] # remember: ID_2 is the order of samples
AEGS123_AllChrQC_filtered <- subset(AEGS123_AllChrQC_reorder,
!is.na(QC2018_FILTER),
select = c("ID_1", "ID_2", "UPID", "STUDY_NUMBER", "SampleID_postImpChrX",
"QC2018_FINAL", "QC2018_FILTER", "SELECTION",
"AEGS_type", "CHIP", "STUDY_TYPE", "SAMPLE_TYPE",
"PC1", "PC2", "PC3", "PC4", "PC5",
"PC6", "PC7", "PC8", "PC9", "PC10",
"Sex", "Age", "ORyear",
"BMI", "Plasma_PCSK9_C", "Plasma_PCSK9_rankNorm_C",
"Calcification", "Collagen",
"Fat10", "Fat40", "IPH",
"SMC_rankNorm", "MAC_rankNorm", "Neutrophils_rankNorm", "MastCells_rankNorm", "VesselDensity_rankNorm",
"OverallPlaquePhenotype", "Plaque_Vulnerability_Index",
"Plasma_PCSK9", "Plasma_PCSK9_rankNorm"))
fwrite(AEGS123_AllChrQC_filtered,
file = paste0(SNP_loc, "/",Today,".",PROJECTNAME,".AEGS123.chrX.sample"),
na = "NA", sep = "\t", quote = FALSE,
row.names = FALSE, col.names = TRUE,
showProgress = TRUE, verbose = TRUE)This installation of data.table has not been compiled with OpenMP support.
omp_get_num_procs() 1
R_DATATABLE_NUM_PROCS_PERCENT unset (default 50)
R_DATATABLE_NUM_THREADS unset
R_DATATABLE_THROTTLE unset (default 1024)
omp_get_thread_limit() 1
omp_get_max_threads() 1
OMP_THREAD_LIMIT unset
OMP_NUM_THREADS unset
RestoreAfterFork true
data.table is using 1 threads with throttle==1024. See ?setDTthreads.
No list columns are present. Setting sep2='' otherwise quote='auto' would quote fields containing sep2.
Column writers: 12 3 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12
args.doRowNames=0 args.rowNames=0 doQuote=0 args.nrow=2012 args.ncol=42 eolLen=1
maxLineLen=1127. Found in 0.001s
Writing bom (false), yaml (0 characters) and column names (true) ... done in 0.001s
Writing 2012 rows in 1 batches of 2012 rows (each buffer size 8MB, showProgress=1, nth=1)
require(DT)
DT::datatable(AEGS123_AllChrQC, caption = "AEGS: final sample list of genotyped AE patients after quality control (chromosome X).", rownames = FALSE)Warning in instance$preRenderHook(instance) :
It seems your data is too big for client-side DataTables. You may consider server-side processing: https://rstudio.github.io/DT/server.html
Warning in instance$preRenderHook(instance) :
It seems your data is too big for client-side DataTables. You may consider server-side processing: https://rstudio.github.io/DT/server.html
dim(AEGS123_AllChrQC)[1] 2012 42
Here we create a variantlist.txt file used by GWASToolKit for analysis.
variant_list
temp <- subset(variant_list, select = c("VariantID", "Chr", "BP"))
fwrite(temp,
file = paste0(SNP_loc, "/variantlist.txt"),
na = "NA", sep = "\t", quote = FALSE,
row.names = FALSE, col.names = FALSE,
showProgress = TRUE, verbose = TRUE)This installation of data.table has not been compiled with OpenMP support.
omp_get_num_procs() 1
R_DATATABLE_NUM_PROCS_PERCENT unset (default 50)
R_DATATABLE_NUM_THREADS unset
R_DATATABLE_THROTTLE unset (default 1024)
omp_get_thread_limit() 1
omp_get_max_threads() 1
OMP_THREAD_LIMIT unset
OMP_NUM_THREADS unset
RestoreAfterFork true
data.table is using 1 threads with throttle==1024. See ?setDTthreads.
No list columns are present. Setting sep2='' otherwise quote='auto' would quote fields containing sep2.
Column writers: 12 5 5
args.doRowNames=0 args.rowNames=0 doQuote=0 args.nrow=5 args.ncol=3 eolLen=1
maxLineLen=140. Found in 0.000s
Writing bom (false), yaml (0 characters) and column names (false) ... done in 0.001s
Writing 5 rows in 1 batches of 5 rows (each buffer size 8MB, showProgress=1, nth=1)
rm(temp)Here we create a covariates.txt file used by GWASToolKit for analysis.
library(tidyverse)
# for 'overall' analyses
c("Age Sex PC1 PC2 CHIP ORyear") %>% write_lines(paste0(SNP_loc, "/covariates.txt"))
# for sex-specific analyses
c("Age PC1 PC2 CHIP ORyear") %>% write_lines(paste0(SNP_loc, "/covariates.sex.txt"))Here we create a phenotypes.txt file used by GWASToolKit for analysis.
library(tidyverse)
c("Calcification", "Collagen", "Fat10", "Fat40", "IPH", "SMC_rankNorm", "MAC_rankNorm", "Neutrophils_rankNorm", "MastCells_rankNorm", "VesselDensity_rankNorm", "OverallPlaquePhenotype", "Plaque_Vulnerability_Index", "Plasma_PCSK9", "Plasma_PCSK9_rankNorm") %>% write_lines(paste0(SNP_loc, "/phenotypes.txt"))Version: v1.1.0
Last update: 2021-09-29
Written by: Sander W. van der Laan (s.w.vanderlaan-2[at]umcutrecht.nl).
Description: Script to get some Athero-Express Biobank Study baseline characteristics.
Minimum requirements: R version 3.4.3 (2017-06-30) -- 'Single Candle', Mac OS X El Capitan
Changes log
* v1.1.0 Major update to WORCS system.
* v1.0.6 Small bug fixes.
* v1.0.5 Added png for overlap-figure.
* v1.0.5 Removed obsolete references to objects.
* v1.0.4 Fixed a mistake in the chr X sample-file creation. Now the order matches the chr X data.
* v1.0.3 Fixed weight of files (limit of 10Mb per file for templates). Renamed entire repo.
* v1.0.2 Added sex-specific .sample-files. Added GWASToolKit input-files.
* v1.0.0 Initial version. Add 'plaque vulnerability index', Fixed baseline table, added codes, and results. Created sample-files.
sessionInfo()R version 4.1.1 (2021-08-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.6
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] grid tools stats graphics grDevices utils datasets methods base
other attached packages:
[1] gridExtra_2.3 plyr_1.8.6 labelled_2.8.0 openxlsx_4.2.4 UpSetR_1.4.0 ggpubr_0.4.0 forestplot_2.0.1 checkmate_2.0.0
[9] magrittr_2.0.1 pheatmap_1.0.12 devtools_2.4.2 usethis_2.0.1 BlandAltmanLeh_0.3.1 tableone_0.13.0 haven_2.4.3 eeptools_1.2.4
[17] DT_0.18 knitr_1.33 forcats_0.5.1 stringr_1.4.0 purrr_0.3.4 tibble_3.1.3 ggplot2_3.3.5 tidyverse_1.3.1
[25] data.table_1.14.0 naniar_0.6.1 tidyr_1.1.3 dplyr_1.0.7 optparse_1.6.6 readr_2.0.0
loaded via a namespace (and not attached):
[1] utf8_1.2.2 tidyselect_1.1.1 lme4_1.1-27.1 htmlwidgets_1.5.3 ranger_0.13.1 maptools_1.1-1 munsell_0.5.0 codetools_0.2-18
[9] effectsize_0.4.5 withr_2.4.2 colorspace_2.0-2 rstudioapi_0.13 ggsignif_0.6.2 vcd_1.4-8 labeling_0.4.2 emmeans_1.6.2-1
[17] rticles_0.20 bit64_4.0.5 farver_2.1.0 datawizard_0.1.0 rprojroot_2.0.2 coda_0.19-4 vctrs_0.3.8 generics_0.1.0
[25] TH.data_1.0-10 xfun_0.25 R6_2.5.0 arm_1.11-2 cachem_1.0.5 assertthat_0.2.1 vroom_1.5.4 scales_1.1.1
[33] multcomp_1.4-17 nnet_7.3-16 gtable_0.3.0 processx_3.5.2 sandwich_3.0-1 rlang_0.4.11 sjPlot_2.8.9 splines_4.1.1
[41] rstatix_0.7.0 broom_0.7.9 BiocManager_1.30.16 yaml_2.2.1 abind_1.4-5 modelr_0.1.8 crosstalk_1.1.1 backports_1.2.1
[49] Hmisc_4.5-0 ellipsis_0.3.2 jquerylib_0.1.4 RColorBrewer_1.1-2 proxy_0.4-26 sessioninfo_1.1.1 Rcpp_1.0.7 base64enc_0.1-3
[57] ps_1.6.0 prettyunits_1.1.1 rpart_4.1-15 openssl_1.4.4 zoo_1.8-9 cluster_2.1.2 fs_1.5.0 survey_4.1-1
[65] lmtest_0.9-38 reprex_2.0.1 worcs_0.1.8 mvtnorm_1.1-2 sjmisc_2.8.7 pkgload_1.2.1 hms_1.1.0 evaluate_0.14
[73] xtable_1.8-4 rio_0.5.27 sjstats_0.18.1 jpeg_0.1-9 readxl_1.3.1 ggeffects_1.1.1 testthat_3.0.4 compiler_4.1.1
[81] credentials_1.3.1 crayon_1.4.1 minqa_1.2.4 htmltools_0.5.1.1 tzdb_0.1.2 Formula_1.2-4 visdat_0.5.3 lubridate_1.7.10
[89] DBI_1.1.1 sjlabelled_1.1.8 dbplyr_2.1.1 MASS_7.3-54 boot_1.3-28 sys_3.4 Matrix_1.3-4 getopt_1.20.3
[97] car_3.0-11 cli_3.0.1 mitools_2.4 parallel_4.1.1 insight_0.14.2 pkgconfig_2.0.3 foreign_0.8-81 sp_1.4-5
[105] xml2_1.3.2 bslib_0.2.5.1 estimability_1.3 rvest_1.0.1 callr_3.7.0 digest_0.6.27 parameters_0.14.0 rmarkdown_2.10
[113] cellranger_1.1.0 htmlTable_2.2.1 curl_4.3.2 nloptr_1.2.2.2 lifecycle_1.0.0 nlme_3.1-152 jsonlite_1.7.2 carData_3.0-4
[121] desc_1.3.0 askpass_1.1 fansi_0.5.0 pillar_1.6.2 lattice_0.20-44 prereg_0.5.0 fastmap_1.1.0 httr_1.4.2
[129] pkgbuild_1.2.0 survival_3.2-11 glue_1.4.2 remotes_2.4.0 bayestestR_0.10.5 zip_2.2.0 gert_1.3.1 png_0.1-7
[137] bit_4.0.4 pander_0.6.4 class_7.3-19 stringi_1.7.3 sass_0.4.0 performance_0.7.3 latticeExtra_0.6-29 memoise_2.0.0
[145] renv_0.14.0 e1071_1.7-9
save.image(paste0(PROJECT_loc, "/",Today,".",PROJECTNAME,".results.RData"))| © 1979-2021 Sander W. van der Laan | s.w.vanderlaan[at]gmail.com | swvanderlaan.github.io. |